Presently the following German speech corpora are available on ISO 9660 CD-ROM:
The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German news paper corpus, thus resulting in a total of approx. 10000 recorded utterances.
The corpus contains read speech of 101 different speakers. Each speaker has read approx. 100 sentences from either the SZ subcorpus or the CeBit subcorpus. The language is German. The subcorpus SZ contains 544 sentences from newspaper articles. The subcorpus CeBit contains 483 sentences from newspaper articles about the CeBit 1995. Each subcorpus is divided into 5 parts of approx. 100 utterances each. Every speaker read only one part of one subcorpus (with some exceptions), thus resulting in a total of approx. 10100 recorded utterances.
The corpus contains read speech of 201 different speakers. Each speaker has read a subcorpus of 450 different sentences (including alphanumericals and two shorter passages of prose text); 8 speakers have read the whole sentence corpus. The speakers were recorded at four different sites in Germany (University of Kiel, University of Bonn, University of Bochum, University of Munich). The language is German. The corpus contains a total of 21681 recorded utterances.
The corpus contains read speech of 16 different speakers. Each speaker has read a corpus of 200 different sentences from a train inquiry task. The speakers were recorded at three different sites in Germany (University of Kiel, University of Bonn, University of Munich). The language is German. The corpus contains a total of 3200 recorded utterances.
Spontaneous speech recorded in a dialogue task (appointment scheduling)
See the following URL for more information about the VERBMOBIL project: