Core Management
Core Management
The Document class is the central API for loading Word documents and converting them to other formats. This page covers format conversion workflows, save-options configuration, and text extraction.
Loading and Saving
Load a document with Document() and call save() with a SaveFormat constant to convert to the target output format. Supported inputs: DOCX, DOC, RTF, TXT, Markdown. Supported outputs: PDF, Markdown, TXT.
import aspose.words_foss as aw
doc = aw.Document("input.docx")
doc.save("output.md", aw.SaveFormat.MARKDOWN)
doc.save("output.pdf", aw.SaveFormat.PDF)
doc.save("output.txt", aw.SaveFormat.TEXT)Call save() multiple times on the same Document to produce multiple output formats without reloading.
PDF Export with PdfSaveOptions
For default PDF output, pass SaveFormat.PDF. For fine-grained control, use a PdfSaveOptions object:
import aspose.words_foss as aw
from aspose.words_foss.saving import PdfSaveOptions
doc = aw.Document("input.docx")
# Default PDF export
doc.save("default.pdf", aw.SaveFormat.PDF)
# Customized PDF export with save options
doc.save("custom.pdf", PdfSaveOptions())PdfSaveOptions accepts settings for JPEG image quality (0–100, default 100) and PDF standards compliance level (default PDF/1.7).
Markdown Export with MarkdownSaveOptions
For default Markdown output, pass SaveFormat.MARKDOWN. Use MarkdownSaveOptions when you need to control formatting behavior:
import aspose.words_foss as aw
from aspose.words_foss.saving import MarkdownSaveOptions
doc = aw.Document("input.docx")
# Default Markdown export
doc.save("default.md", aw.SaveFormat.MARKDOWN)
# Customized Markdown export with save options
doc.save("with_options.md", MarkdownSaveOptions())MarkdownSaveOptions supports controlling underline formatting preservation in the output.
Text Extraction
Extract plain text from any loaded document with get_text():
import aspose.words_foss as aw
doc = aw.Document("input.docx")
text = doc.get_text()For text file output, use SaveFormat.TEXT:
doc.save("output.txt", aw.SaveFormat.TEXT)Common Issues
| Issue | Cause | Fix |
|---|---|---|
ModuleNotFoundError | Package not installed | Run pip install aspose-words-foss>=26.4.0 |
Empty text from get_text() | Input file is empty or corrupted | Verify the input file opens correctly in a word processor |
| PDF output missing images | Image format not supported by the converter | Use a DOCX input with standard embedded images |
API Reference Summary
| Class / Method | Description |
|---|---|
Document | Load Word documents from DOCX, DOC, RTF, TXT, or Markdown |
Document.save() | Save to PDF, Markdown, or plain text |
Document.get_text() | Extract plain text content |
SaveFormat | Constants: PDF, MARKDOWN, TEXT |
PdfSaveOptions | Configure PDF compliance and JPEG quality |
MarkdownSaveOptions | Configure underline formatting export |