Orthography has been used in several different roles in spoken language lexica, some of which have already been noted:
Each of these functions is distinct and needs to be kept conceptually separate in order to avoid confusion. The functions (1) and (2) are not particularly problematic. Function (3) is traditionally a feature of speech recognition systems for relatively small vocabularies. The larger the vocabulary, however, the greater the danger of introducing unnecessary orthographic noise, i.e. intrusive artefacts due to homography (words with identical spelling and different pronunciation); for this reason, in new architectures, phonological (e.g.\ phonemic or autosegmental) representation in word graphs may be preferred. Function (4) is unproblematic, though similar reservations as with (3) are to be noted. Function (5) is the main function and is obviously essential for written output of any kind; however, it is often confused with both functions (2) and (3). Care with consistent orthography is obviously essential.
Orthography has the advantage of being highly standardised, except for certain regional variants (British and American English; Federal, Swiss, and Austrian German) and variations in publishers' conventions (e.g. British English ise/-ize as in standardisation/standardization, capitalisation of adjectives in nominal function in German, as die anderen / die Anderen, or variations in hyphenation conventions and the spelling of compound words; variation is found particularly in the treatment of derived and compound word s (e.g.\ separation and hyphenation) and in the use of typographic devices such as capitalisation). Orthography is given further attention in the section on lexical representation.
A standard orthographic transcription is often used for convenience as a means of representing and accessing words in a spoken language lexicon. This has several reasons:
Most European languages have highly regulated orthographies, the use of which is associated with social and political rewards and punishments. Official orthographic reforms, which typically generate much controversy among the general public, may necessitate some re-implementation of spelling checkers and grapheme-phoneme converters (cf. the ongoing reform of German orthography).
For use in spoken language lexica, particularly in word lists used for training and testing recognisers , consistency is essential and often additional conventions are required in order to meet the criterion of general computer readability in the case of special letters and diacritics. Although it cannot be regarded as a standard, it is becomming common practice to use the ASCII codings or their LaTeX adaptations for specific countries. For example, a standard computer-readable orthography for German has become widely accepted for German speech recognition applications which marks special characters, in particular those with an Umlaut diacritic, as shown in Table 6.2.
Standard orthography | ASCII orthography |
Äpfel | "Apfel |
ändern | "andern |
Öl | "Ol |
östlich | "ostlich |
Überzug | Uberzug |
über | "uber |
heiß | "s |
The results of the EAGLES Working Groups on Text Corpora and Lexica should be consulted on orthographic and other matters pertaining to written texts.