Pronunciation information

Next: Prosodic information Up: Lexical surface information Previous: Orthographic information

Pronunciation information

Pronunciation information is much more application specific (and indeed theory specific) than orthographic information. Standardly, information about phonemic structure is included in the form of a phonemic transcription of a standard canonical or citation form pronunciation, i.e. the pronunciation of a word in isolation in a standard variety of the language. Often the phonemic transcription is enhanced by including prosodic information such as the stress position (Dutch, English, German), the type of tonal accent (Swedish), syllable and word boundaries in compound words, and word and phrase boundaries in phrasal idioms . Morphological information (morph boundaries , as well as the boundaries of words and phrases) is relevant to stress patterning, and is sometimes also included.

A particularly thorny question is the inclusion of information about pronunciation variants, of which there are two main types, rule-governed allophonic and phonostylistic variants, and idiosyncratic lexical variants. The following rules of thumb can be given:

Pronunciation lexica for synthesis generally require one standard (canonical ) pronunciation; however, variants of these with different prosodic contexts may be required.
Pronunciation lexica for recognition require a distinction to be made between variants of the same word, and variants which are associated with the same spelling but different words (heterophonic homographs ).
Strictly speaking, pronunciation lexica for recognition require only lexical variants to be listed which are idiosyncratic and cannot be predicted by rule (e.g. English either /a/ - /i:/). Variants which are general and regular (such as the reduction of schwa + liquid or nasal to a syllabic liquid or nasal ) can be calculated using pronunciation rules (phonological rules): English running /rn/ - /rnn/, German einem /anm/ - /an/ - /am/).

Although phoneme is a technical term with somewhat different definitions in different theoretical contexts, and although there are technical arguments due to Generative Phonology [Chomsky & Halle (1968)] which show that the notion of phoneme leads to inconsistencies, the core of phoneme theory is relatively standard. In linguistics handbooks, the phoneme is commonly defined as the minimal distinctive (meaning-distinguishing) unit (temporal segment) of sound. In the following fairly standard definition, the distinctiveness criterion is implicit in the concept of a system; the concept of a sound (= phone, allophone) covers possible variants of a phoneme (e.g. English aspirated word-initial /p/ as opposed to unaspirated /p/ in the context /sp.../ [Crystal (1985), p. 228,]:

PHONEME (PHONEMIC(S)) The minimal unit in the sound SYSTEM of a LANGUAGE ...Sounds are considered to be members of the same phoneme if they are phonetically similar and do not occur in the same ENVIRONMENT.

A fairly complete definition is thus based on distinctiveness, minimality, phonetic similarity and distributional complementarity. Phoneme definitions are differential or relational definitions, illustrated by the notion of minimal difference between two words in minimal pairs such as the items in the set of English words pin-tin-kin-fin-thin-sin-shin-chin-bin-din-gin-win-Lynne-Min-Nin, (in standard SAMPA computer readable phonemic transcription: /pn - tn - kn - fn - n - sn - Sn - tn - bn - dn - n - wn - ln - mn - nn/) (the last three are names). Phonemes defined in this way are further classified as bundles of phonological distinctive features. Operationally, phonemes are defined by procedures of segmentation and classification (reflected, for example, in the recognition and classification components of automatic speech recognition systems):

Segmentation is the procedure of isolating minimal distinctive temporal phonetic segments (phones ).
Classification is the procedure of classifying phones as allophones (phonetic alternants of the same phoneme , on the grounds of distinctiveness, minimality, phonetic similarity and complementary distribution (i.e. their occurrence in complementary contexts as contextual variants of that phoneme ).

In contrast to orthographic representations, which for social and cultural reasons, are highly standardised common knowledge, lexical representations of pronunciation are theory and application specific. The most widely used representations in pronouncing dictionaries for human use, such as in foreign language teaching, and in spoken language systems, are phonemic transcriptions .

Phonemic descriptions are available for several hundred languages, and phonemic transcriptions based on these are suitable for constructing roman orthographies for languages which have orthographies based on different principles (e.g. syllabic or logographic) or no orthography at all. For a given language, phonemic descriptions differ peripherally (for instance, it is controversial whether diphthongs and affricates are to be analysed as one phoneme or two?). Phonemes are in general the units of choice for practical phonological transcriptions in spoken language system lexica. Other, more specialised types of representation such as the feature matrix representations required by all modern phonological descriptions, and autosegmental lattice representations, or metrical tree graph and histogram representations [Goldsmith (1990)] are increasingly finding application in experimental systems [Kornai (1991), Carson-Berndsen (1993), Kirchhoff (1996), Church (1987b), Church (1987a)] because of their richness and their more direct relation to the acoustic signal, in contrast to phonemic representations. However at the lexical level, they can generally be calculated relatively easily from the more compact, but less general, phonemic representations. Because of the widespread use of phonemes , the concept is discussed in more detail below; for fuller explanations, textbooks on phonology should be consulted.

The central question in phonological lexical representation, in cases where the notion of phoneme alone is not fully adequate, is that of the level of representation (level of description, level of abstraction). There are three main levels, each of which is an essential part of a full description, and which needs to be evaluated for all but the simplest applications, morphophonemic, phonemic, and phonetic, which are characterised below.

Morphophonemic:

The morphophonemic level provides a simplification of phonological information with respect to the phonological level; the simplifications utilise knowledge about the morphological structure of words, and permit the use of morphophonemes , (a near-synonym is archiphoneme ) which stand for classes of morphologically and phonologically related phonemes .

A standard example of a morphophoneme is the final obstruent in languages with final obstruent devoicing , including Dutch and German. For example, the phonemic representation German Weg /ve:k/ `way' - Wege /ve:g/ `ways' corresponds to a morphophonemic representation {ve:G} - {ve:G+}, which simplifies the description of the stem of the word. The morphophoneme {G} stands for the phoneme set {/k/, /g/}, and selection of the appropriate member of the set (the appropriate feature specification) is triggered by the morphological boundary and neighbouring phonological segments. Alternatively the morphophoneme may be said to consist of the underspecified feature bundle shared by /k/ and /g/, or more technically, the feature bundle which subsumes the feature bundles of /k/ and /g/.

An example from English is the alternation /f/ - /v/ in plural formation in one class of nouns, as in knife /naf/ - knives /navz/, which can be represented morphophonemically as {naV} - {naV+z}. The morphophoneme {V} stands for the phoneme set {/f/, /v/}. Here, too, selection of the phoneme (specification of the underspecified subsuming feature bundle) is determined by the morphological boundary and the phonological properties of neighbouring segments.

A corresponding level is necessary for the description of spelling: cf. variations such as English y-ie in city - cities, or German s-ss-ß as in Bus - Busse, Kuß (Kuss in the new orthography) - Küsse and Fuß - Füße.

Morphophonemic representations augmented by realisation rules are a useful compression technique for reducing lexicon size :

Lexica can be stem -based, and thus have fewer entries, and all inflections can be automatically calculated by rule for any stem in the lexicon.
Morphotactic and morphophonological rules can be used for extending lexica of fully inflected attested forms, and for checking such lexica for consistency.

For requirements such as these, the use of morphophonemic representations, supplemented by morphological construction rules and morphophonemic mapping rules is recommended ([Koskenniemi (1983)], [Karttunen (1983)], [Ritchie et al. (1992)], [Bleiching et al. (1996)] for descriptions of various practical approaches).

There are no standard conventions for the representation of morphophonemes , whether computer readable or not (but see the SAMPA alphabet for French, Appendix B); capital letters are often used in linguistics publications. Note that this use of capital letters at the morphophonemic level should not be confused with the use of ASCII upper case codes in the SAMPA alphabet at the phonemic level.

Citations of morphophonemic representations are often delimited with brace brackets {...}.

Phonemic:

The phonemic level is a standard intermediate level corresponding to criteria outlined in more detail below. The standard European computer readable phonetic alphabet is SAMPA (Appendix B): this alphabet is used for the main languages of the European Union, and is recommended for this purpose. The internationally recognised standard alphabet for phonemic representations is the International Phonetic Alphabet (IPA ). The IPA alphabet is used for the most part in the text of this handbook, and is shown in Appendix A. One of the main functions of the International Phonetic Association since its inception over 100 years ago has been to coordinate and define standards for this alphabet.

Until relatively recently, the special font used for the IPA has made it difficult to interface it with spoken language systems , and for this reason a number of computer-readable encodings of subsets of the IPA have been made for various languages [Allen (1988), Esling (1988), Esling (1990), Jassem & obacz (1989), Ball (1991)]. The standard computer phonetic alphabet for the main languages of the European Union is the SAMPA alphabet, developed in the ESPRIT SAM and SAM-A projects [Wells (1987), Wells (1989), Wells (1993b), Wells (1993a), Llisterri & Mariño (1993)]; see also Appendix B. SAMPA is widely used in European projects, both for corpus transcription and for lexical representations (see also the chapter on Spoken Language Corpora).

However, there is a standard numerical code for IPA symbols (cf. [Esling (1988), Esling (1990)]; Appendix B), and developments in user interfaces with graphical visualisation in recent years are leading to the increasing use of the IPA in its original form, particularly in the speech lab software which is used in spoken language system development.

Citations of phonemic representations are standardly delimited by slashes /.../.

Phonetic:

At the phonetic level further details of pronunciation, beyond the phonemically minimal features, are given. Since the relation between the phonemic and the phonetic level can be described by general rules mapping phonemes to their detailed realisations (allophones ) in specific contexts [Woods & Zue (1976)], it is strictly speaking redundant to include these regular variants in a lexicon. However, for reasons of efficiency, detailed phonetic word models for speech recogniser training or for speech synthesis may be calculated using phonological rules and stored. Essentially this is a software decision: whether to use tables (for efficiency of lookup) or rules (for compactness and generality) for a given purpose.

A specific version of the phonetic level of transcription is phonotypic transcription , defined as a mapping from the phonemic level using regular phonological rules of assimilation , deletion , epenthesis [Autesserre et al. (1989)]; this level is frequently used for generating additional word models to improve speech recognition . Since the amount of phonetic detail which can be processed depends heavily on the vocabulary size and the number of phonological rules which are considered relevant, no general recommendation on this can be given.

There is no widely used standard ASCII encoding of the entire IPA for computer readable phonetic representations and therefore no recommendations can be given on this. A proposal by John Wells, the originator of SAMPA, is under discussion. Currently, individual laboratories use their own enhancements of phonemic representations. However, the fuller encodings mentioned in connection with the phonemic level of transcription are eminently suitable for interface purpose at the phonetic level, and will no doubt be increasingly used where more detailed phonetic information is required.

Citations of phonetic forms are standardly delimited by square brackets [ ...].

Chapters 4, 5, 3 should also be consulted in respect of levels and types of corpus representation.

Next: Prosodic information Up: Lexical surface information Previous: Orthographic information

EAGLES SWLG SoftEdition, May 1997. Get the book...