Core Management

Core Management

The Document class is the central API for loading Word documents and converting them to other formats. This page covers format conversion workflows, save-options configuration, and text extraction.


Loading and Saving

Load a document with Document() and call save() with a SaveFormat constant to convert to the target output format. Supported inputs: DOCX, DOC, RTF, TXT, Markdown. Supported outputs: PDF, Markdown, TXT.

import aspose.words_foss as aw

doc = aw.Document("input.docx")
doc.save("output.md", aw.SaveFormat.MARKDOWN)
doc.save("output.pdf", aw.SaveFormat.PDF)
doc.save("output.txt", aw.SaveFormat.TEXT)

Call save() multiple times on the same Document to produce multiple output formats without reloading.


PDF Export with PdfSaveOptions

For default PDF output, pass SaveFormat.PDF. For fine-grained control, use a PdfSaveOptions object:

import aspose.words_foss as aw
from aspose.words_foss.saving import PdfSaveOptions

doc = aw.Document("input.docx")

# Default PDF export
doc.save("default.pdf", aw.SaveFormat.PDF)

# Customized PDF export with save options
doc.save("custom.pdf", PdfSaveOptions())

PdfSaveOptions defines properties for JPEG image quality, PDF standards compliance, font embedding, and other settings.

Note: PdfSaveOptions properties (compliance, JPEG quality, font embedding mode, image compression, etc.) are defined for API forward-compatibility but are not yet consumed by the PDF writer. Setting them currently has no effect on output.


Markdown Export with MarkdownSaveOptions

For default Markdown output, pass SaveFormat.MARKDOWN. Use MarkdownSaveOptions when you need to control formatting behavior:

import aspose.words_foss as aw
from aspose.words_foss.saving import MarkdownSaveOptions

doc = aw.Document("input.docx")

# Default Markdown export
doc.save("default.md", aw.SaveFormat.MARKDOWN)

# Customized Markdown export with save options
doc.save("with_options.md", MarkdownSaveOptions())

MarkdownSaveOptions supports controlling underline formatting preservation in the output.

Note: Only export_underline_formatting is currently applied during Markdown export. Other MarkdownSaveOptions properties (table_content_alignment, list_export_mode, export_images_as_base64, images_folder, images_folder_alias) are defined for API forward-compatibility but are not yet consumed by the Markdown writer.


Text Extraction

Extract plain text from any loaded document with get_text():

import aspose.words_foss as aw

doc = aw.Document("input.docx")
text = doc.get_text()

For text file output, use SaveFormat.TEXT:

doc.save("output.txt", aw.SaveFormat.TEXT)

Common Issues

IssueCauseFix
ModuleNotFoundErrorPackage not installedRun pip install aspose-words-foss>=26.4.0
Empty text from get_text()Input file is empty or corruptedVerify the input file opens correctly in a word processor
PDF output missing imagesImage format not supported by the converterUse a DOCX input with standard embedded images

API Reference Summary

Class / MethodDescription
DocumentLoad Word documents from DOCX, DOC, RTF, TXT, or Markdown
Document.save()Save to PDF, Markdown, or plain text
Document.get_text()Extract plain text content
SaveFormatConstants: PDF, MARKDOWN, TEXT
PdfSaveOptionsPDF export configuration (properties are forward-compatibility stubs; not yet applied)
MarkdownSaveOptionsConfigure underline formatting export

See Also

 English