Following the Dutch POLYPHONE experience, each utterance may be assessed, independently of (and after completion of) the transcription. This information will be stored in a separate log file. Four different assessment types will be used:
A file is to be marked as ``Garbage'' if
Garbage files should not be retained.
Files are marked ``Noise'' if they contain clearly audible background noise in addition to the speech. A `hard' criterion triggering this rating can be the failure of the recording platform to stop recording after the speaker completed the utterance (the platform can be set to consider two seconds of `silence' as end of utterance).
Files are marked ``OTHER'' if they contain
Files were rated OK in all other conditions. Note that OK does not mean that the subject adhered exactly to the prompting text in read items; if he did not hesitate in speaking something else and there is no high level background noise, the item is rated OK. Also, utterances trimmed at the beginning or end are rated OK, provided that the first or last word present in the file are in no way damaged.
If a file can be rated both as ``NOISE'' and ``OTHER'', it must be rated ``OTHER''. Each file must be given exactly one rating.
It can be useful to provide an opportunity to make a comment about speaker characteristics (which could be stored once for all calls by that speaker) which are helpful for later analysis and selection of utterances, e.g. foreign or non-native speaker accent; very unclear, quiet or loud speakers; and especially stuttering or other significantly serious production characteristics, significantly poor voice quality, or uncooperative speakers whose data is not useful for training or testing. This could be optionally marked here and later included in the log file for the transcriptions or in the speaker description file.