Next: Spoken language characterisation
Up: SL corpus representation
Previous: Non-linguistic and other phenomena
- For the transcription of dialogues
between more than two speakers use a ``music score notation''.
- For orthographic transcriptions , use the standard spelling as much as
possible.
- Indicate reduced word forms in orthographic transcriptions a) if these forms occur frequently and b) if they involve
syllable deletion.
- Use at least two types of ``filler'' syllable: one vowel-like type
uh, and one nasal type mm.
- Non-speech acoustic events should be annotated at the correct location
in the utterance, by first transcribing the words and then indicating which
words are simultaneous with the acoustic events.
- When orthographic transcription is used in a
corpus, it is recommended that a list of unique words and word forms is
generated on the basis of the transcription . The
orthographic forms of the words can then be converted to phonemes
by means of computerised grapheme-to-phoneme
conversion. The result of this process is a list of
citation forms, also called canonical
forms or citation-phonemic forms. These forms represent the pronunciation of
words when spoken in isolation, and do not cover variations in pronunciation
found in running speech. However, this procedure will at least give a standard
pronunciation as a starting-point. This is especially relevant if a corpus is
to be used by other persons than those belonging to that language community. On
the basis of these canonical forms, phonetic transcriptions can be made
semi-automatically using large vocabulary
speech recognisers.
- If there is no compelling reason otherwise, do not start to transcribe a corpus
phonetically, since the time spent on this will never be recovered. If very
specific phonetic details are needed, one is advised to look for these on
the basis of orthographic and/or
phonemic transcriptions.
- It is recommended that transcribers give information about the process of
transcribing and about the speech that they have transcribed. Some speakers
will be easier to transcribe than other speakers. This will
depend on the speech rate, the clarity of articulation, the amount of
hesitation, and the number of dialect words used by the speakers. Some
information about the difficulty of the transcription is
very useful for later queries. The transcribers of the Switchboard
(telephone)
Corpus were asked to indicate on a scale ranging from 1 to 5 the following
characteristics of a conversation: difficulty, topicality,
naturalness , echo from B (in listening to A separately, B
could hardly be heard (1) or was as nearly as loud as A (5)), echo from A,
static on A (no static noise (1) or great deal of it (5)), static
on B, background A, and background B.
- In the case of transcriptions at more than one level (e.g.\
orthographic transcription with some
prosodic marks and indications of hesitations etc.), the
recommendation is to listen to one level at a time. In everyday life, listeners
are accustomed to ignoring hesitations, false starts, and other imperfections,
and also do not pay explicit attention to prosody . Transcribers
must learn to hear all these events. It seems easiest to listen to the words
first and transcribe these, and then to assign the prosodic
marks and other annotations .
- For orthographic transcriptions it is not
necessary to find experienced transcribers. However, for
phonemic and phonetic transcriptions it is necessary to use
transcribers who are accustomed to listening to speech in a very precise,
analytical way.
- To give some indication about the time necessary to transcribe speech, here are
some examples. The time that will be necessary to make an orthographic
transcription of spontaneous
speech is about ten times the
duration of the speech itself. The time necessary for an
orthographic transcription of read sentences
is about three times the duration of the speech and for an
orthographic transcription of read texts
it
is about five times the duration of the speech.
- Checking of transcription is always necessary. Checking
can be done in different ways. An independent transcriber can transcribe the
whole or a sample of the corpus. Another possibility is to allow someone else
to check the transcription by reading the
transcription and listening to the speech. This is less
time-consuming. In the case of the latter procedure, it is recommended that the
transcription be checked in the opposite order to that used
by the first transcriber, since towards the end of the material the first
transcriber will be more self-consistent than at the beginning. Inconsistencies
may occur in the conventions used (spelling and annotation
conventions (brackets, etc.)), as well as in what is heard by the two different
persons.
- For the label file format, use any format that can easily be converted to a
WAVES label file, for the sake of portability across different systems.
- Any accuracy measure based on inter-transcriber consistency must control for the
factors ``level of transcription '', ``segment type'', and ``task type'' (whether
segmentation or labelling ).
- If the corpus is confined to one language, and if the labelling is to be
alphabetic rather than true IPA symbols, then it is advisable to use a
language-specific set of characters. This avoids the notational complexity
necessary when all symbols must be kept distinct across all languages, as is
needed in the study of general phonetics .
- When transcribing prosodically , the provisional recommendation is to use
either the ToBI or the IPO system (and the MARSEC system if a purely auditory
transcription is being carried out). If the language to be transcribed is not
English, and especially if the projected application of the prosodic
transcription is in the field of speech technology, then it is probably best to
use the IPO system if possible (i.e. if the basic ``grammar'' of contours has
already been researched for that language).
Next: Spoken language characterisation
Up: SL corpus representation
Previous: Non-linguistic and other phenomena
EAGLES SWLG SoftEdition, May 1997. Get the book...