next up previous contents index
Next: Corpora in preparation Up: The Bavarian Archive for Previous: General information

Corpora

Presently the following German speech corpora are available on ISO 9660 CD-ROM:

Siemens 1000 - SI1000 (5 CD-ROMs)

The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German news paper corpus, thus resulting in a total of approx. 10000 recorded utterances.

Siemens 100 - SI100 (7 CD-ROMs)

The corpus contains read speech of 101 different speakers. Each speaker has read approx. 100 sentences from either the SZ subcorpus or the CeBit subcorpus. The language is German. The subcorpus SZ contains 544 sentences from newspaper articles. The subcorpus CeBit contains 483 sentences from newspaper articles about the CeBit 1995. Each subcorpus is divided into 5 parts of approx. 100 utterances each. Every speaker read only one part of one subcorpus (with some exceptions), thus resulting in a total of approx. 10100 recorded utterances.

PHONDAT 1 - PD1 (4 CD-ROMs, 2nd edition)

The corpus contains read speech of 201 different speakers. Each speaker has read a subcorpus of 450 different sentences (including alphanumericals and two shorter passages of prose text); 8 speakers have read the whole sentence corpus. The speakers were recorded at four different sites in Germany (University of Kiel, University of Bonn, University of Bochum, University of Munich). The language is German. The corpus contains a total of 21681 recorded utterances.

PHONDAT 2 - PD2 (1 CD-ROM, 2nd edition)

The corpus contains read speech of 16 different speakers. Each speaker has read a corpus of 200 different sentences from a train inquiry task. The speakers were recorded at three different sites in Germany (University of Kiel, University of Bonn, University of Munich). The language is German. The corpus contains a total of 3200 recorded utterances.

VERBMOBIL

Spontaneous speech recorded in a dialogue task (appointment scheduling)

See the following URL for more information about the VERBMOBIL project:

http://www.dfki.uni-sb.de/verbmobil/

Strange Corpus 1 - SC1 (Accents)(1 CD-ROM)
The story ``Nordwind und Sonne'' read by 72 speakers with foreign accents and 16 native German speakers. The utterances read by the latter are phonologically segmented by hand.


next up previous contents index
Next: Corpora in preparation Up: The Bavarian Archive for Previous: General information

EAGLES SWLG SoftEdition, May 1997. Get the book...