next up previous contents index
Next: Printable ASCII-strings and continuously Up: Eight main differences between Previous: Correcting errors in the

Orthographic identity and phonetic variability of lexicalised units

In correctly written texts any morphologically inflected lexical item generally has just one distinct orthographic form. Thus the words of European languages are easily identified and also well distinguished from each other, and there is usually only one version of each possible orthographic contextual form of any given word. The spoken versions of orthographically identical word forms show a great phonetic variation in their segmental and prosodic  realisation. In most European languages the phonetic form of a given word is in fact extremely variable depending on the context and other well defined intervening variables such as speaking style  and context of situation, strong and weak Lombard effects  (the influence of the physical environment on speech production via acoustic feedback), etc. A given word can totally disappear phonetically, or can be reduced to - and only signalled by - some reflection of segmental features in the prosody  of the utterance. Most of these inconspicuous variations appear only in a narrow phonetic transcription   of a given pronunciation.
It makes a great difference whether a word has been uttered in isolation  or in continuous speech . Only if a word is consciously and very carefully produced in isolation can we observe the explicit version of its segmental structure . These phonetically explicit forms produced in a careful speaking style  are called citation forms   or canonical forms.  The segmental structure  of so-called citation forms is modified as soon as it is integrated into connected speech  (probably systematically, although relatively little of the system is currently understood). For the design of spoken language corpora  this is very relevant. It has also been taken into account in the conventions of the IPA  proposed for Computer Representation of Individual Languages (CRIL, see Appendix A). 
In dealing with SL data one must be able to know which words the speaker intended to express in a given utterance. This is reflected in the CRIL convention of the IPA (see Section 5.2.4). Here it should be mentioned that an SL data collection should ideally have at least two and possibly three different symbolically specified levels which are related to the acoustic speech signal:

  1. On the first level the words of the given utterance are identified as lexical units  in their orthographic form.
  2. On the second level a broad phonetic transcription   of the citation form  should be given, which may be the result of automatic grapheme-to-phoneme conversion , as for very large SL corpora it would cost too much time and too much money to make broad phonetic transcriptions   manually.
  3. How the given words have been actually pronounced in a given speech signal must be specified in terms of a narrower phonetic transcription   of each individual utterance on a third, optional CRIL-level.   This third level can then be directly aligned to the segments or acoustic features of the digital speech signal in the data base, which can be done automatically or manually. This information is especially relevant if also multi-sensor data are to be incorporated in SL databases.

Detailed phonetic transcriptions   are subject to intra and inter-transcriber variability. Furthermore, they are extremely expensive, to the extent that they are likely to be prohibitive for large corpora. However, recent attempts using large vocabulary  speech recognisers  for the acoustic decoding of speech show some promise that the process can be automated, at least to the extent that pronunciation variation can be predicted by means of general phonological and phonetic rules.
In addition to phonetic detail on the segmental level, several uses of spoken language corpora may also require prosodic  annotation . In this area much work remains to be done to develop commonly agreed annotation systems. Once such systems exist, one may attempt to support annotation by means of automatic recognition procedures.



next up previous contents index
Next: Printable ASCII-strings and continuously Up: Eight main differences between Previous: Correcting errors in the

EAGLES SWLG SoftEdition, May 1997. Get the book...