CMap (Character Mapping)
CMap in PDF
A CMap (Character Map) maps character codes in a PDF content stream to Unicode code points or glyph names. CMap resources are used for text extraction, searching, and accessibility in PDF documents (ISO 32000-1:2008, §9.10).
AdobeGlyphList
AdobeGlyphList provides a static mapping from Adobe glyph names to Unicode
code points:
int unicode = AdobeGlyphList.getUnicode("bullet"); // returns 0x2022
boolean exists = AdobeGlyphList.contains("copyright"); // trueThis is used when decoding glyph names from Type 1 and TrueType fonts that use the Adobe standard glyph naming convention.
Text Encoding
PDF text encoding maps character codes to glyphs through a combination of the font’s Encoding dictionary, the ToUnicode CMap, and the font’s character width table. Correct CMap handling is essential for reliable text extraction.