CMap (Character Mapping)

CMap in PDF

A CMap (Character Map) maps character codes in a PDF content stream to Unicode code points or glyph names. CMap resources are used for text extraction, searching, and accessibility in PDF documents (ISO 32000-1:2008, §9.10).

AdobeGlyphList

AdobeGlyphList provides a static mapping from Adobe glyph names to Unicode code points:

int unicode = AdobeGlyphList.getUnicode("bullet");  // returns 0x2022
boolean exists = AdobeGlyphList.contains("copyright");  // true

This is used when decoding glyph names from Type 1 and TrueType fonts that use the Adobe standard glyph naming convention.

Text Encoding

PDF text encoding maps character codes to glyphs through a combination of the font’s Encoding dictionary, the ToUnicode CMap, and the font’s character width table. Correct CMap handling is essential for reliable text extraction.

CMap (Character Mapping)

CMap in PDF

AdobeGlyphList

Text Encoding

See Also