Problems

Next: Multi-Byte encodings Up: ASCII Previous: ASCII

Problems

ASCII and its 8 bit extensions are not sufficient for

the representation of scripts that contain more than 256 different characters, or
mixed text documents which use different alphabets in one document.

Ideographic alphabets, e.g. Chinese, Korean, and Japanese, require more than the 256 characters allowed in 8-bit ASCII. Multi-Byte code tables are needed to adequately represent the character set of these alphabets.

Mixed text documents, e.g. regular text with mathematical formulae or phonetic transcriptions, are based on multiple code tables. The document is divided into sections which are encoded with the appropriate code table, and markers assign a code table to each section. In general, switching code tables is achieved through the application of a particular font to a section of the document, e.g. a phonetic font. However, this really is an abuse of the font mechanism, because a new glyph is assigned to a particular code, instead of just modifying the original glyph. For example, when the font ``Symbol'' is applied to a sequence of characters abc on the Macintosh, abc is changed to the characters of the Greek alphabet, which is a change of content meaning.

Mixed text documents require that all fonts used in a document be present on the machine the document is processed on. This makes porting to different architectures difficult.

Again, multi-byte character encodings provide sufficient space for all characters needed in a document and thus avoid mixed text documents altogether.

EAGLES SWLG SoftEdition, May 1997. Get the book...