The citation-phonemic level (also referred to as the ``phonemic'' level by [Barry & Fourcin (1992)] may contain the output phoneme string derived from the orthographic form (by lexical access, by letter-to-sound rules, or both). There are various possibilities for representing the phoneme symbols. Some platforms have the facility to display the full range of IPA symbols, such as the symbols used by the LaTeX font wsuipa11 (see Table 5.1).
However, many researchers will need to use some other means, such as an alphabetic or numeric representation of IPA symbols. The numeric equivalents of all IPA symbols are displayed in [Esling & Gaylord (1993)]. An alphabetic equivalent system is used on the newsgroup sci.lang.
If the requirement is narrowed to symbols only for the main European languages, then the SAMPA system (see Appendix B) will be sufficient. This system has the advantage that, for any given language, only one grapheme is used per phoneme , with no spaces in between. Other systems that have been proposed (principally for English) allow a string of two or more graphemes to represent a phoneme , but a space between each phoneme representation is then necessary.
In the case of English, there are still more alphabetic systems that have been
used in the past, such as (for American English) ARPABET and KLATTBET, and
(for British English RP) Edinburgh's Machine-Readable Phonetic Alphabet, all
of which use short grapheme strings separated by a blank space. However,
a language-specific set of alphabetic phoneme symbols has not yet been devised
for all possible languages.
If the corpus is confined to one language, and if the labelling is to be alphabetic rather than true IPA symbols, then it is advisable to use a language-specific set of characters. This avoids the notational complexity necessary when all symbols must be kept distinct across all languages, as is needed in the study of general phonetics .