Speech file and associated description file formats

It is now agreed as a standard for SAM speech databases, that a speech file contains only speech waveforms, and that an associated description file is generated at the recording session. Thus the files are matched, their names being identical, except for the last letter of the extension.

For example, if the speaker AA records the corpus number BB (list of six sentences in English), and the current available file number in the recording lab is nnnn, the files produced will be:

AABBnnnn.SES sampled speech
(AABBnnnn.SEL L for Laryngograph)
(AABBnnnn.SE2 for the second channel signal file)
AABBnnnn.SEO associated description file generated automatically during recording.
(O = orthographic time-aligned labelling)
The associated description file has standard label file format, with a header and a body. (see C.3.1 Header format for label files; C.3.2 for body of label file). It contains all the information usually needed by people working on the files without a database management system.

