In this chapter the linguistic representation of spoken language corpora will be discussed. As stated in Chapter 3, one of the factors that determine whether a collection of speech is a speech corpus is the fact that the latter is augmented with linguistic annotation (i.e. a symbolic representation of the speech). Since it is impossible to examine the sampled speech data directly, it is only by means of the symbolic representation of the speech that one is able to navigate through the corpus. It is important to note that all types of representations of speech are the result of an analysis or classification of the speech. The representations are not the speech itself, but an abstraction from it. However, they are sometimes used as if they were the speech itself.
In most cases, the symbolic representation of the speech implies that a transcription of the speech is made. Transcriptions are used in many fields of linguistics, including phonetics , phonology, dialectology , sociolinguistics , psycholinguistics , second language teaching, and speech pathology. Transcriptions are also used in disciplines like psychology, anthropology, and sociology. The type of transcription very much depends on its purpose. In particular, this purpose determines the degree of detail that is required. For example, if a speech corpus has been designed to investigate the amount of time several speakers are speaking simultaneously in a dialogue, a very global transcription will be sufficient. If a corpus has been collected to establish differences in pronunciations of words, one needs to have a very precise segmental transcription .
Detailed phonemic or phonetic transcriptions of large scale spoken language corpora with many speakers and much (spontaneous) speech can never be achieved. This would be too time-consuming and expensive. Therefore most large speech corpora are provided with word for word transcriptions , i.e. word level orthographic representations of what has been said (e.g. the ATIS and Switchboard corpora ). However, a medium sized corpus of read speech can be provided with a segmental transcription and even with labelling at the segmental level. Examples are the American English TIMIT corpus, which consists of 630 speakers each reading 10 sentences, and also the German PHONDAT corpora (1990 and 1992, both read speech ) and German VERBMOBIL corpus (from 1993, spontaneous speech ). An orthographic transcription (sometimes referred to as a transliteration) may be converted into a canonical phonemic transcription by means of a grapheme-phoneme converter or a pronunciation table.
It has been found that providing reliable phonetic transcriptions for large corpora is hardly feasible [Cucchiarini (1993)]. However, detailed transcriptions of a small number of specific phenomena (e.g. presence/absence of diphthongation , voiced /voiceless character of fricatives ) can be made relatively fast and reliably if the occurrences of these phenomena can be retrieved quickly with the aid of annotation and direct access to files offered in a computerised speech corpus [Van Hout (1989), Van Bezooijen & Van Hout (1985)].
During the International Conference on Spoken Language Processing (ICSLP) in Banff, Canada in 1992, a workshop was held on ``Orthographic and Phonetic Transcription ''. The goal of the workshop was to agree on areas where community-wide conventions were needed, to identify and document current work, and to establish a means of future communication and continued cooperation.
In the remainder of this section some general remarks will be made about transcriptions of read speech versus transcriptions of spontaneous speech. In addition, the levels and types of transcription will be introduced. In the next section (5.2), some background will be given on the task of segmenting and labelling speech. The following section (Section 5.3) will discuss the levels and types of representation in detail. For each level, reference will be made to existing corpora where possible, the symbols to be used will be presented, and recommendations will be given.