PDF Parser
PDFParser
PDFParser reads and interprets the raw PDF byte stream, constructing the COS
object graph for the document. It is the entry point for parsing an existing PDF
byte source into a navigable object model.
PDFLexer
PDFLexer tokenizes the PDF syntax from a byte stream, identifying keywords,
names, strings, numbers, and other lexical elements. It is the underlying scanner
used by PDFParser.
PDFWriter
PDFWriter serializes the COS object model back to a PDF byte stream. It handles
cross-reference table generation and incremental update logic.
Using the Parser API
The Document class constructor invokes the parser internally when given a file
path or stream. For direct parser access, use Document.getParser() to obtain the
PDFParser instance associated with the loaded document:
try (Document doc = new Document("input.pdf")) {
PDFParser parser = doc.getParser();
// Access low-level document structure
}