In this appendix you find the transcription conventions for the SPEECHDAT corpora. The starting point of these conventions are the conventions used by LDC/ARPA in producing the ATIS CD-ROMs. The project has decided to simplify the transcription task to enable it to be performed quickly and to represent the most important acoustic events adequately for training and testing of automatic speech recognisers.
The SPEECHDAT corpora comprise 7 different languages: English, German, French, Spanish, Portuguese, Italian, and Danish. In the final section of this appendix some language-specific issues and choices are described. The documentation accompanying each language database will however describe fully all optional conventions and transcriptions used.