Orthographic identity and phonetic variability of lexicalised units

Next: Printable ASCII-strings and continuously Up: Eight main differences between Previous: Correcting errors in the

Orthographic identity and phonetic variability of lexicalised units

In correctly written texts any morphologically inflected lexical item generally has just one distinct orthographic form. Thus the words of European languages are easily identified and also well distinguished from each other, and there is usually only one version of each possible orthographic contextual form of any given word. The spoken versions of orthographically identical word forms show a great phonetic variation in their segmental and prosodic realisation. In most European languages the phonetic form of a given word is in fact extremely variable depending on the context and other well defined intervening variables such as speaking style and context of situation, strong and weak Lombard effects (the influence of the physical environment on speech production via acoustic feedback), etc. A given word can totally disappear phonetically, or can be reduced to - and only signalled by - some reflection of segmental features in the prosody of the utterance. Most of these inconspicuous variations appear only in a narrow phonetic transcription of a given pronunciation.
It makes a great difference whether a word has been uttered in isolation or in continuous speech . Only if a word is consciously and very carefully produced in isolation can we observe the explicit version of its segmental structure . These phonetically explicit forms produced in a careful speaking style are called citation forms or canonical forms. The segmental structure of so-called citation forms is modified as soon as it is integrated into connected speech (probably systematically, although relatively little of the system is currently understood). For the design of spoken language corpora this is very relevant. It has also been taken into account in the conventions of the IPA proposed for Computer Representation of Individual Languages (CRIL, see Appendix A).
In dealing with SL data one must be able to know which words the speaker intended to express in a given utterance. This is reflected in the CRIL convention of the IPA (see Section 5.2.4). Here it should be mentioned that an SL data collection should ideally have at least two and possibly three different symbolically specified levels which are related to the acoustic speech signal:

On the first level the words of the given utterance are identified as lexical units in their orthographic form.
On the second level a broad phonetic transcription of the citation form should be given, which may be the result of automatic grapheme-to-phoneme conversion , as for very large SL corpora it would cost too much time and too much money to make broad phonetic transcriptions manually.
How the given words have been actually pronounced in a given speech signal must be specified in terms of a narrower phonetic transcription of each individual utterance on a third, optional CRIL-level. This third level can then be directly aligned to the segments or acoustic features of the digital speech signal in the data base, which can be done automatically or manually. This information is especially relevant if also multi-sensor data are to be incorporated in SL databases.

Detailed phonetic transcriptions are subject to intra and inter-transcriber variability. Furthermore, they are extremely expensive, to the extent that they are likely to be prohibitive for large corpora. However, recent attempts using large vocabulary speech recognisers for the acoustic decoding of speech show some promise that the process can be automated, at least to the extent that pronunciation variation can be predicted by means of general phonological and phonetic rules.
In addition to phonetic detail on the segmental level, several uses of spoken language corpora may also require prosodic annotation . In this area much work remains to be done to develop commonly agreed annotation systems. Once such systems exist, one may attempt to support annotation by means of automatic recognition procedures.

Next: Printable ASCII-strings and continuously Up: Eight main differences between Previous: Correcting errors in the

EAGLES SWLG SoftEdition, May 1997. Get the book...