CMap (Character Mapping)

CMap in PDF

A CMap (Character Map) maps character codes in a PDF content stream to Unicode code points or glyph names. CMap resources are used for text extraction, searching, and accessibility in PDF documents (ISO 32000-1:2008, §9.10).

AdobeGlyphList

AdobeGlyphList provides a static mapping from Adobe glyph names to Unicode code points:

int unicode = AdobeGlyphList.getUnicode("bullet");  // returns 0x2022
boolean exists = AdobeGlyphList.contains("copyright");  // true

This is used when decoding glyph names from Type 1 and TrueType fonts that use the Adobe standard glyph naming convention.

Text Encoding

PDF text encoding maps character codes to glyphs through a combination of the font’s Encoding dictionary, the ToUnicode CMap, and the font’s character width table. Correct CMap handling is essential for reliable text extraction.

See Also