Detailed phonetic or acoustic annotation

Next: SAMPROSA (SAM PROSodic Alphabet): Up: Levels of annotation and Previous: SAMPA as a phonemic

Detailed phonetic or acoustic annotation

For finer segmental annotation of speech recordings, three basically different approaches are offered for discussion. All three approaches require a separate annotation tier, but the labels are temporally defined by the location of the phonemic segment boundaries (phonemic markers in the case of centre labelling).

The SAMPA symbols are given language-independent sound values (IPA equivalent values) and modified by means of agreed diacritic codes to reflect fine phonetic detail.

Advantages:

No new segmentation or marker placements would be required.

Disadvantages:

Different symbols would sometimes be required at the phonemic and phonetic levels, particularly for vowels. For instance, Danish /{/ might have to be represented by phonetic [E]; English /{/ might have to be represented by phonetic [a] or even [A], depending on regional accent.
Diacritic symbols would have to be agreed for all partner languages.
ASCII coding on one keyboard would possibly not be sufficient for the necessary IPA symbols and diacritics, and there would be little or no mnemonic value in the choice of many symbols.

The SAMPA phonemic values are retained for each language, and the phonemic segment is subdivided into acoustically quasi-homogeneous elements. For example, /k/ may contain a partially voiced closure, a clear burst, and a period of aspiration prior to the vowel onset. Note that this approach is an acoustic-event labelling and is used in a similar way at CERFIA, IES and UCL. The following characterisation retains the primary symbol as ``pointer'' to the phonemic identity of the utterance:

kv = Voiced portion of closure

kc = Voiceless portion of closure

kp = k-burst

ka = k-aspiration

Advantages:

The acoustic realisation of each phonemic segment is defined in greater detail than is possible even in narrow phonetic transcription, where, for example, a partially voiced closure cannot be easily represented.

Disadvantages:

New segment markers have to be set.
The system can only apply to approaches that recognise the need to define segment boundaries (however arbitrary they may be theoretically).

Note: It must be pointed out that the two-symbol representation given above is redundant, in that the acoustic-event categories are common to phoneme classes rather than individual phonemes; i.e. pc, tc, and kc would all be a period of voiceless closure and therefore not require the place specification. Also, if the phonemic category is specified in a different tier of annotation, it is recoverable, and may be used for a database search, e.g. with a view to developing a set of rules covering the possible ``internal'' structures of stretches of signal associated with a particular phoneme. At present, some partners need to retain the ``phonemic pointer'' in order to derive the phonemic label file from the lower level acoustic-event file.

A third approach, favoured by the linguistic group at ICP (Grenoble) recognises transitional phases between areas marked as optimally representative of a particular phoneme category. The finer labelling requires the delimitation of the (centre-marked) optimal area, thus also delimiting the area of coarticulation.

Advantages:

The theoretically doubtful ``changeover point'' from one ``phoneme'' segment to another is avoided, and areas of indeterminacy are identified.

Disadvantages:

New markers have to be set.

Each of these approaches would provide an annotation which is closer to the (acoustic-) phonetic realisation of the utterance than the phonemic SAMPA labels. For the development of speech knowledge in general, and for the definition of rules describing the structure of continuous speech in particular, the use of a more detailed annotation is essential. It is the symbolic bridge between measurable acoustic parameters and abstract phonological categories. Which approach is selected for more detailed annotation within the SAM project depends on the use to which it will be put. Essentially, the closer a symbolic representation comes to significant acoustic events (whereby ``significant'' is an application-dependent term), the more useful it will be in speech-knowledge acquisition and rule development. Both synthesis and recognition assessment can only gain.

Next: SAMPROSA (SAM PROSodic Alphabet): Up: Levels of annotation and Previous: SAMPA as a phonemic

EAGLES SWLG SoftEdition, May 1997. Get the book...

`kv`	=	Voiced portion of closure
`kc`	=	Voiceless portion of closure
`kp`	=	k-burst
`ka`	=	k-aspiration