For finer segmental annotation of speech recordings, three basically different approaches are offered for discussion. All three approaches require a separate annotation tier, but the labels are temporally defined by the location of the phonemic segment boundaries (phonemic markers in the case of centre labelling).
The SAMPA symbols are given language-independent sound values (IPA equivalent values) and modified by means of agreed diacritic codes to reflect fine phonetic detail.
The SAMPA phonemic values are retained for each language, and the phonemic segment
is subdivided into acoustically quasi-homogeneous elements. For example, /k/ may contain a partially
voiced closure, a clear burst, and a period of aspiration prior to the vowel onset. Note that this
approach is an acoustic-event labelling and is used in a similar way
at CERFIA, IES and UCL.
The following characterisation retains the primary symbol as ``pointer'' to the phonemic
identity of the utterance:
|kv||=||Voiced portion of closure|
|kc||=||Voiceless portion of closure|
Note: It must be pointed out that the two-symbol representation given above is redundant, in that the acoustic-event categories are common to phoneme classes rather than individual phonemes; i.e. pc, tc, and kc would all be a period of voiceless closure and therefore not require the place specification. Also, if the phonemic category is specified in a different tier of annotation, it is recoverable, and may be used for a database search, e.g. with a view to developing a set of rules covering the possible ``internal'' structures of stretches of signal associated with a particular phoneme. At present, some partners need to retain the ``phonemic pointer'' in order to derive the phonemic label file from the lower level acoustic-event file.
A third approach, favoured by the linguistic group at ICP (Grenoble) recognises transitional phases between areas marked as optimally representative of a particular phoneme category. The finer labelling requires the delimitation of the (centre-marked) optimal area, thus also delimiting the area of coarticulation.
Each of these approaches would provide an annotation which is closer to the (acoustic-) phonetic realisation of the utterance than the phonemic SAMPA labels. For the development of speech knowledge in general, and for the definition of rules describing the structure of continuous speech in particular, the use of a more detailed annotation is essential. It is the symbolic bridge between measurable acoustic parameters and abstract phonological categories. Which approach is selected for more detailed annotation within the SAM project depends on the use to which it will be put. Essentially, the closer a symbolic representation comes to significant acoustic events (whereby ``significant'' is an application-dependent term), the more useful it will be in speech-knowledge acquisition and rule development. Both synthesis and recognition assessment can only gain.