Next: ASCII Up: Character codes and computer Previous: Character codes and computer

Introduction

This appendix discusses the relationship between character sets (or alphabets) and their encoding on computers. It is based on the terminology as used in the Unicode standard (Unicode Standard vol. 1.0).

Three levels of representation can be discerned:

character
glyph
code

A character is the basic unit of an alphabet. Within the alphabet it has a name, a position, and a content meaning. For example, the character named ``a'' is the first letter of the standard Latin alphabet. Its content meaning (in the European languages) is loosely related to its pronunciation, i.e. a vocalic sound with the following IPA description: front, low, unrounded. Characters have no visible graphic representation; this representation is produced through rendering the character on a suitable medium, e.g. paper, computer screens, etc.

A glyph is the ``essential shape'' of a character; it is the result of the rendering process. Glyphs can be modified through the application of case, font, style, and size operations. For example, the essential shape of the first letter of the Latin alphabet in lower case is the a. In different fonts, this glyph may be modified with a monospaced font to a, or it may slanted as in a, or boldened as in a, etc.

A code is a mapping of characters to a set of symbols or signs, e.g.\ numbers, other characters, etc. This mapping is in general an arbitrary one, and it must be known in order to encode a character or decode a code. For example, in 7-bit ASCII, the character ``a'' is encoded as the 7 bit integer number 97.

A script consists of an alphabet and a set of rules that determines the direction of writing (left to right, right to left, up to down, etc.), and the composition of characters (placement of accents, combination of glyphs, etc.).

Next: ASCII Up: Character codes and computer Previous: Character codes and computer

EAGLES SWLG SoftEdition, May 1997. Get the book...