PDF Parser

PDFParser

PDFParser reads and interprets the raw PDF byte stream, constructing the COS object graph for the document. It is the entry point for parsing an existing PDF byte source into a navigable object model.

PDFLexer

PDFLexer tokenizes the PDF syntax from a byte stream, identifying keywords, names, strings, numbers, and other lexical elements. It is the underlying scanner used by PDFParser.

PDFWriter

PDFWriter serializes the COS object model back to a PDF byte stream. It handles cross-reference table generation and incremental update logic.

Using the Parser API

The Document class constructor invokes the parser internally when given a file path or stream. For direct parser access, use Document.getParser() to obtain the PDFParser instance associated with the loaded document:

try (Document doc = new Document("input.pdf")) {
    PDFParser parser = doc.getParser();
    // Access low-level document structure
}

See Also