Core Management

Core Management

The Document class is the central API for loading Word documents and converting them to other formats. This page covers format conversion workflows, save-options configuration, and text extraction.


Loading and Saving

Load a document with Document() and call save() with a SaveFormat constant to convert to the target output format. Supported inputs: DOCX, DOC, RTF, TXT, Markdown. Supported outputs: PDF, Markdown, TXT.

import aspose.words_foss as aw

doc = aw.Document("input.docx")
doc.save("output.md", aw.SaveFormat.MARKDOWN)
doc.save("output.pdf", aw.SaveFormat.PDF)
doc.save("output.txt", aw.SaveFormat.TEXT)

Call save() multiple times on the same Document to produce multiple output formats without reloading.


PDF Export with PdfSaveOptions

For default PDF output, pass SaveFormat.PDF. For fine-grained control, use a PdfSaveOptions object:

import aspose.words_foss as aw
from aspose.words_foss.saving import PdfSaveOptions

doc = aw.Document("input.docx")

# Default PDF export
doc.save("default.pdf", aw.SaveFormat.PDF)

# Customized PDF export with save options
doc.save("custom.pdf", PdfSaveOptions())

PdfSaveOptions accepts settings for JPEG image quality (0–100, default 100) and PDF standards compliance level (default PDF/1.7).


Markdown Export with MarkdownSaveOptions

For default Markdown output, pass SaveFormat.MARKDOWN. Use MarkdownSaveOptions when you need to control formatting behavior:

import aspose.words_foss as aw
from aspose.words_foss.saving import MarkdownSaveOptions

doc = aw.Document("input.docx")

# Default Markdown export
doc.save("default.md", aw.SaveFormat.MARKDOWN)

# Customized Markdown export with save options
doc.save("with_options.md", MarkdownSaveOptions())

MarkdownSaveOptions supports controlling underline formatting preservation in the output.


Text Extraction

Extract plain text from any loaded document with get_text():

import aspose.words_foss as aw

doc = aw.Document("input.docx")
text = doc.get_text()

For text file output, use SaveFormat.TEXT:

doc.save("output.txt", aw.SaveFormat.TEXT)

Common Issues

IssueCauseFix
ModuleNotFoundErrorPackage not installedRun pip install aspose-words-foss>=26.4.0
Empty text from get_text()Input file is empty or corruptedVerify the input file opens correctly in a word processor
PDF output missing imagesImage format not supported by the converterUse a DOCX input with standard embedded images

API Reference Summary

Class / MethodDescription
DocumentLoad Word documents from DOCX, DOC, RTF, TXT, or Markdown
Document.save()Save to PDF, Markdown, or plain text
Document.get_text()Extract plain text content
SaveFormatConstants: PDF, MARKDOWN, TEXT
PdfSaveOptionsConfigure PDF compliance and JPEG quality
MarkdownSaveOptionsConfigure underline formatting export
 English