next up previous contents index
Next: Spoken language characterisation Up: SL corpus representation Previous: Non-linguistic and other phenomena

List of recommendations

  1. For the transcription   of dialogues between more than two speakers use a ``music score notation''.
  2. For orthographic transcriptions , use the standard spelling as much as possible.
  3. Indicate reduced word forms  in orthographic transcriptions   a) if these forms occur frequently and b) if they involve syllable  deletion. 
  4. Use at least two types of ``filler'' syllable: one vowel-like type uh, and one nasal  type mm.
  5. Non-speech acoustic events should be annotated at the correct location in the utterance, by first transcribing the words and then indicating which words are simultaneous with the acoustic events.
  6. When orthographic transcription   is used in a corpus, it is recommended that a list of unique words and word forms is generated on the basis of the transcription . The orthographic forms of the words can then be converted to phonemes  by means of computerised grapheme-to-phoneme  conversion. The result of this process is a list of citation  forms, also called canonical  forms or citation-phonemic forms. These forms represent the pronunciation of words when spoken in isolation, and do not cover variations in pronunciation found in running speech. However, this procedure will at least give a standard pronunciation as a starting-point. This is especially relevant if a corpus is to be used by other persons than those belonging to that language community. On the basis of these canonical forms, phonetic transcriptions   can be made semi-automatically using large vocabulary  speech recognisers. 
  7. If there is no compelling reason otherwise, do not start to transcribe a corpus phonetically, since the time spent on this will never be recovered. If very specific phonetic details are needed, one is advised to look for these on the basis of orthographic   and/or phonemic   transcriptions.
  8. It is recommended that transcribers give information about the process of transcribing and about the speech that they have transcribed. Some speakers will be easier to transcribe than other speakers. This will depend on the speech rate, the clarity of articulation, the amount of hesitation, and the number of dialect  words used by the speakers. Some information about the difficulty of the transcription  is very useful for later queries. The transcribers of the Switchboard   (telephone) Corpus were asked to indicate on a scale ranging from 1 to 5 the following characteristics of a conversation: difficulty, topicality, naturalness , echo  from B (in listening to A separately, B could hardly be heard (1) or was as nearly as loud as A (5)), echo  from A, static on A (no static noise  (1) or great deal of it (5)), static on B, background A, and background B.
  9. In the case of transcriptions  at more than one level (e.g.\ orthographic transcription  with some prosodic  marks and indications of hesitations etc.), the recommendation is to listen to one level at a time. In everyday life, listeners are accustomed to ignoring hesitations, false starts, and other imperfections, and also do not pay explicit attention to prosody . Transcribers must learn to hear all these events. It seems easiest to listen to the words first and transcribe these, and then to assign the prosodic  marks and other annotations .
  10. For orthographic transcriptions   it is not necessary to find experienced transcribers. However, for phonemic   and phonetic  transcriptions it is necessary to use transcribers who are accustomed to listening to speech in a very precise, analytical way.
  11. To give some indication about the time necessary to transcribe speech, here are some examples. The time that will be necessary to make an orthographic transcription   of spontaneous speech  is about ten times the duration  of the speech itself. The time necessary for an orthographic transcription   of read  sentences is about three times the duration  of the speech and for an orthographic transcription  of read texts   it is about five times the duration  of the speech.
  12. Checking of transcription  is always necessary. Checking can be done in different ways. An independent transcriber can transcribe the whole or a sample of the corpus. Another possibility is to allow someone else to check the transcription  by reading the transcription  and listening to the speech. This is less time-consuming. In the case of the latter procedure, it is recommended that the transcription  be checked in the opposite order to that used by the first transcriber, since towards the end of the material the first transcriber will be more self-consistent than at the beginning. Inconsistencies may occur in the conventions used (spelling and annotation  conventions (brackets, etc.)), as well as in what is heard by the two different persons.
  13. For the label file format, use any format that can easily be converted to a WAVES label file, for the sake of portability  across different systems.
  14. Any accuracy  measure based on inter-transcriber consistency must control for the factors ``level of transcription '', ``segment type'', and ``task type'' (whether segmentation  or labelling ).
  15. If the corpus is confined to one language, and if the labelling  is to be alphabetic rather than true IPA  symbols, then it is advisable to use a language-specific set of characters. This avoids the notational complexity necessary when all symbols must be kept distinct across all languages, as is needed in the study of general phonetics .
  16. When transcribing prosodically  , the provisional recommendation is to use either the ToBI  or the IPO  system (and the MARSEC system  if a purely auditory transcription  is being carried out). If the language to be transcribed is not English, and especially if the projected application of the prosodic transcription    is in the field of speech technology, then it is probably best to use the IPO  system if possible (i.e. if the basic ``grammar'' of contours has already been researched for that language).

next up previous contents index
Next: Spoken language characterisation Up: SL corpus representation Previous: Non-linguistic and other phenomena

EAGLES SWLG SoftEdition, May 1997. Get the book...