Next: Vowels Up: X-SAMPA Previous: X-SAMPA


The SAMPA alphabet was developed in the late 1980s by John Wells, in consultation with a wide range of colleagues, to meet a need for a simple machine-readable encoding of phonetic transcriptions with symbols of the International Phonetic Alphabet (IPA) for file interchange purposes. At that time, standardisation of symbol codes and IPA fonts was not highly developed. The underlying principle of SAMPA was to select those IPA symbols which were conventionally used to represent phonemes in the major languages of the European Union, and to assign a 7-bit ASCII code number (below 128) to each. One of the secondary criteria was the visual similarity of the IPA symbol and the letter representing the ASCII code.

Since that time, the standardisation of IPA encoding has progressed, with the system developed by John Esling (the `Esling codes'), and, more recently, Unicode representations. For practical purposes, however, little has changed at the time of writing, and there is still a need for a straightforward machine-readable encoding.

In the meantime, SAMPA is widely used, and extensions of SAMPA have now been developed for many other languages. In order to aid the development of such extensions, the extended code-set X-SAMPA was devised by John Wells, and encompasses the complete set of IPA conventions. For a number of symbols, human readability had to be sacrificed in favour of simple, unambiguous meachine-readability, owing to the restricted number of ASCII codes. The present collation of SAMPA and X-SAMPA is by Inge Mertins.

For further details, consult Gibbon et al. (1997) and the relevant IPA and SAMPA Internet sites, including project sites with working versions of SAMPA for specific languages.

For prosodic annotation, a number of systems are available. A number of these are discussed in Chapter 1 of Gibbon et al. (2000). The most widely used in extensive corpus annotation, computational linguistics and speech technology is currently ToBI (Tones and Break Indices); the SAMPROSA system Gibbon et al. (1997) contains additional symbols which are suitable for more detailed dialogue transcription.

Readers should be aware that there is still considerable need for standardisation with respect to the use of IPA codes and fonts in consumer software such as word processors and Internet browsers.


Gibbon, Dafydd, Roger Moore & Richard Winski, eds. (1997). Handbook of Standards & Resources for Spoken Language Systems. Berlin: Mouton de Gruyter.

Gibbon, Dafydd, Inge Mertins & Roger Moore, eds. (2000). Handbook of Multimodal and Spoken Dialogue Systems. Dordrecht: Kluwer Academic Publishers.

Next: Vowels Up: X-SAMPA Previous: X-SAMPA

Dafydd Gibbon, Wed Aug 9 11:26:42 CEST 2000