The task of recording speech requires the operator to make a chain of decisions starting with the appropriate microphone and ending with the choice on what coding might be the right one for the specific purpose. Even prior to that, a decision has to be made on what recording environment will be suitable, i.e. whether studio recordings are required in order to obtain speech which is as ``flawless'' or ``clean'' as possible, or on-site recordings, which provide a rather natural talking situation for the subject.
The major concern in both recording modes (studio /on-site) is to avoid further degradation of the quality of the speech once one has obtained technical control over the signal - either by the remaining recording chain or by the way the data is sampled, coded or stored. The difference between both is constituted by the fact that in on-site situations, such as in telephone recordings, control is very limited, whereas in studio recordings everything from the microphone to the storage device can be determined in advance.
Consequently, this section concentrates on giving recommendations on the minimum requirements a recording environment should fulfil for the recording of technically ``flawless'' or ``dry'' speech. Flawless speech we define as
the unweighted reproducible 1:1 transduction of an acoustic signal emitted by a speaker into a sequence of 2 byte numbers that is free of any room or environment information, exhibits a sufficient signal-to-noise ratio of at least 50dB, and has been produced under recording conditions that do not impose any stress upon the speaker in addition to what might be intended for a given talking situation (see Chapter 4).
In so far as the recording manager is able to exercise control over any recording component, everything said about studio recordings also directly applies to on-site recording situations.
Further, we describe recording techniques that might be employed parallel to the pure speech recording, such as pitch determination by laryngograph, physiological measurements, or mimicry and gesture recording.