Parsers
Parsers
Internal API — The classes on this page are not part of the public API and are not exported from the top-level
aspose.words_fosspackage. They are used internally by the document conversion pipeline. Most developers should useDocument.save()andDocument.get_text()instead. See Core Management for the public API. These classes are implementation details and may change without notice.
Aspose.Words FOSS for Python includes specialized parsers that extract structured data from DOCX internals. The NumberingParser handles list numbering definitions, and StyleParser extracts document styles.
Numbering Parser
NumberingParser reads the numbering definitions from a DOCX package and exposes them through a query API. Use parse_numbering_part() to load numbering XML, then query list properties with helper methods.
| Method | Description |
|---|---|
NumberingParser.parse_numbering_part() | Parse the DOCX numbering element |
NumberingParser.get_list_info() | Get information about a specific list by ID |
NumberingParser.get_level_info() | Get level details for a list at a given depth |
NumberingParser.is_ordered_list() | Check whether a list level is ordered or bulleted |
NumberingParser.get_start_value() | Get the starting number for a list level |
NumberingParser.get_delimiter() | Get the delimiter string for a list level |
Style Parser
StyleParser parses style names into structured ParsedStyle objects. It identifies headings, blockquotes, code blocks, and list paragraphs from DOCX style names.
| Method | Description |
|---|---|
StyleParser.parse() | Parse a style name into a ParsedStyle object |
StyleParser.get_style_chain() | Parse a chain of style names for inherited styles |
StyleParser.is_setext_heading() | Check if a style is a Setext-style heading |
StyleParser.extract_all_styles() | Extract individual style names from a comma-separated chain |
Numbering Data Model
Parsed numbering data is stored in structured objects:
| Class | Description |
|---|---|
NumberingInfo | Numbering definition with num_id, abstract_num_id, and levels |
NumberingLevel | Level definition with format, start, and text properties |
ListInfo | Information about a specific list instance |
ListLevelInfo | Level-specific formatting details |
Tips and Best Practices
- Call
parse_numbering_part()once after loading a document to populate all list definitions - Use
is_ordered_list()to distinguish numbered lists from bulleted lists - Use
get_style_chain()to parse inherited style chains in a single call - Numbering and style parsers are used internally by the document conversion pipeline
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| Empty numbering definitions | Document has no lists | Check get_list_info() return value before accessing properties |
| Missing style | Style name not recognized | Use parse() with a known style name |
| Incorrect list level | Wrong level parameter | List levels are zero-indexed |
API Reference Summary
| Class / Method | Description |
|---|---|
NumberingParser.parse_numbering_part() | Parse DOCX numbering definitions |
NumberingParser.get_list_info() | Query list information by ID |
NumberingParser.is_ordered_list() | Check if a list level is ordered |
StyleParser.parse() | Parse a style name into structured information |
StyleParser.get_style_chain() | Parse a chain of inherited style names |
NumberingInfo | Numbering definition data model |
NumberingLevel | Level definition with format and start value |