Core Management
Core Management
The Document class is the central API for loading Word documents and converting them to other formats. This page covers format conversion workflows, save-options configuration, and text extraction.
Loading and Saving
Load a document with Document() and call save() with a SaveFormat constant to convert to the target output format. Supported inputs: DOCX, DOC, RTF, TXT, Markdown. Supported outputs: PDF, Markdown, TXT.
import aspose.words_foss as aw
doc = aw.Document("input.docx")
doc.save("output.md", aw.SaveFormat.MARKDOWN)
doc.save("output.pdf", aw.SaveFormat.PDF)
doc.save("output.txt", aw.SaveFormat.TEXT)Call save() multiple times on the same Document to produce multiple output formats without reloading.
PDF Export with PdfSaveOptions
For default PDF output, pass SaveFormat.PDF. For fine-grained control, use a PdfSaveOptions object:
import aspose.words_foss as aw
from aspose.words_foss.saving import PdfSaveOptions
doc = aw.Document("input.docx")
# Default PDF export
doc.save("default.pdf", aw.SaveFormat.PDF)
# Customized PDF export with save options
doc.save("custom.pdf", PdfSaveOptions())PdfSaveOptions defines properties for JPEG image quality, PDF standards compliance, font embedding, and other settings.
Note:
PdfSaveOptionsproperties (compliance, JPEG quality, font embedding mode, image compression, etc.) are defined for API forward-compatibility but are not yet consumed by the PDF writer. Setting them currently has no effect on output.
Markdown Export with MarkdownSaveOptions
For default Markdown output, pass SaveFormat.MARKDOWN. Use MarkdownSaveOptions when you need to control formatting behavior:
import aspose.words_foss as aw
from aspose.words_foss.saving import MarkdownSaveOptions
doc = aw.Document("input.docx")
# Default Markdown export
doc.save("default.md", aw.SaveFormat.MARKDOWN)
# Customized Markdown export with save options
doc.save("with_options.md", MarkdownSaveOptions())MarkdownSaveOptions supports controlling underline formatting preservation in the output.
Note: Only
export_underline_formattingis currently applied during Markdown export. OtherMarkdownSaveOptionsproperties (table_content_alignment,list_export_mode,export_images_as_base64,images_folder,images_folder_alias) are defined for API forward-compatibility but are not yet consumed by the Markdown writer.
Text Extraction
Extract plain text from any loaded document with get_text():
import aspose.words_foss as aw
doc = aw.Document("input.docx")
text = doc.get_text()For text file output, use SaveFormat.TEXT:
doc.save("output.txt", aw.SaveFormat.TEXT)Common Issues
| Issue | Cause | Fix |
|---|---|---|
ModuleNotFoundError | Package not installed | Run pip install aspose-words-foss>=26.4.0 |
Empty text from get_text() | Input file is empty or corrupted | Verify the input file opens correctly in a word processor |
| PDF output missing images | Image format not supported by the converter | Use a DOCX input with standard embedded images |
API Reference Summary
| Class / Method | Description |
|---|---|
Document | Load Word documents from DOCX, DOC, RTF, TXT, or Markdown |
Document.save() | Save to PDF, Markdown, or plain text |
Document.get_text() | Extract plain text content |
SaveFormat | Constants: PDF, MARKDOWN, TEXT |
PdfSaveOptions | PDF export configuration (properties are forward-compatibility stubs; not yet applied) |
MarkdownSaveOptions | Configure underline formatting export |