Parsers

Parsers

Internal API — The classes on this page are not part of the public API and are not exported from the top-level aspose.words_foss package. They are used internally by the document conversion pipeline. Most developers should use Document.save() and Document.get_text() instead. See Core Management for the public API. These classes are implementation details and may change without notice.

Aspose.Words FOSS for Python includes specialized parsers that extract structured data from DOCX internals. The NumberingParser handles list numbering definitions, and StyleParser extracts document styles.


Numbering Parser

NumberingParser reads the numbering definitions from a DOCX package and exposes them through a query API. Use parse_numbering_part() to load numbering XML, then query list properties with helper methods.

MethodDescription
NumberingParser.parse_numbering_part()Parse the DOCX numbering element
NumberingParser.get_list_info()Get information about a specific list by ID
NumberingParser.get_level_info()Get level details for a list at a given depth
NumberingParser.is_ordered_list()Check whether a list level is ordered or bulleted
NumberingParser.get_start_value()Get the starting number for a list level
NumberingParser.get_delimiter()Get the delimiter string for a list level

Style Parser

StyleParser parses style names into structured ParsedStyle objects. It identifies headings, blockquotes, code blocks, and list paragraphs from DOCX style names.

MethodDescription
StyleParser.parse()Parse a style name into a ParsedStyle object
StyleParser.get_style_chain()Parse a chain of style names for inherited styles
StyleParser.is_setext_heading()Check if a style is a Setext-style heading
StyleParser.extract_all_styles()Extract individual style names from a comma-separated chain

Numbering Data Model

Parsed numbering data is stored in structured objects:

ClassDescription
NumberingInfoNumbering definition with num_id, abstract_num_id, and levels
NumberingLevelLevel definition with format, start, and text properties
ListInfoInformation about a specific list instance
ListLevelInfoLevel-specific formatting details

Tips and Best Practices

  • Call parse_numbering_part() once after loading a document to populate all list definitions
  • Use is_ordered_list() to distinguish numbered lists from bulleted lists
  • Use get_style_chain() to parse inherited style chains in a single call
  • Numbering and style parsers are used internally by the document conversion pipeline

Common Issues

IssueCauseFix
Empty numbering definitionsDocument has no listsCheck get_list_info() return value before accessing properties
Missing styleStyle name not recognizedUse parse() with a known style name
Incorrect list levelWrong level parameterList levels are zero-indexed

API Reference Summary

Class / MethodDescription
NumberingParser.parse_numbering_part()Parse DOCX numbering definitions
NumberingParser.get_list_info()Query list information by ID
NumberingParser.is_ordered_list()Check if a list level is ordered
StyleParser.parse()Parse a style name into structured information
StyleParser.get_style_chain()Parse a chain of inherited style names
NumberingInfoNumbering definition data model
NumberingLevelLevel definition with format and start value
 English