Google
Information Storage and Retrieval: Character Sets and Code Pages

Pages

Tuesday, January 18, 2011

Character Sets and Code Pages

In computer science, the terms character encoding, character map, character set or code page were historically synonymous.

A character set is an agreement on what numeric value, a symbol has. A computer does not understand 'A' or 'B' , it only knows numeric(binary) values of a symbol, defined in the character set used by its Operating system. A computer only 'understands' numbers, hence there is a need of character sets.

ASCII is a 7-bit character set. So, it knows only 128 (2^7) symbols. 'UTF-8' a Unicode multibyte characterset (UTF8/AL32UTF8 in Oracle).

Code page is another name for character encoding. It consists of a table of values that describes the character set for a particular language. Vendors often allocate their own code page number to a character encoding, even if it is better known by another name (for example UTF-8 character encoding has code page numbers 1208 at IBM, 65001 at Microsoft, 4110 at SAP)

No comments: