next up previous contents index
Next: Problems Up: Character codes and computer Previous: Introduction

ASCII

ASCII codes come in various flavours: the original 7-bit ASCII code, platform dependent variations and extensions such as the Mac ASCII or the country pages of IBM PCs, multinational extensions such as the ISO 8859 family, and application dependent extensions such as ISO 8879 for SGML.

7 bit ASCII

7-bit ASCII (also known as US-ASCII, ANSI X3.4) as defined by the American National Standards Institute is the most widespread code for the computer representation of characters. The 128 numbers of US-ASCII are sufficient for the standard English alphabet, punctuation marks, digits, some mathematical operators, and control codes. However, for many uses, this code system is far too restricted.

The ISO 646 family is a set of standards for 7 bit code tables which differs from US ASCII in language dependent codes, e.g. in the German code table the square brackets and curly braces of the 7-bit ASCII are mapped to German umlauts, in the English code table the # is replaced by a £ symbol, etc.

Platform dependent ASCII

Many hardware vendors, especially in the PC market, implemented proprietary extensions to the 7-bit ASCII standard.

The Macintosh uses an 8-bit ASCII extension which was meant to cover all languages using the Latin alphabet; complex characters could be composed from more than one single character, e.g.\ by adding an accent or a dieresis.

On the IBM PC there exist various 8-bit ASCII extensions for individual languages. This reduces the need for character composition from single characters, but introduces incompatibilities between the different ASCII extensions.

ISO 8859

The International Standards Organization has defined an 8-bit extension to ASCII called the ISO 8859-1 or Latin-1 code. This extension leaves the 7-bit US-ASCII unchanged and adds the most common complex characters from the Latin alphabets. These complex characters include some fractions, special symbols such as the registered Trademark symbol, and accented characters.

The ISO 8859 family has not replaced the platform dependent ASCII codes. However, since it has been officially released as a standard, it serves as a reference code table for the translation of code tables in most forms of electronic communication, e.g. e-mail, news, and others.

ISO 8859 has been extended to non-Latin scripts, e.g. Cyrillic, Arabic, Hebrew, etc. and is renamed according to the languages it is used for, e.g. ISO 8859-5 for Cyrillic, ISO 8859-6 for Arabic, and ISO 8859-8 for Hebrew scripts.





next up previous contents index
Next: Problems Up: Character codes and computer Previous: Introduction

EAGLES SWLG SoftEdition, May 1997. Get the book...