Why should a handbook about standards and resources for spoken language systems bother its readers with electroacoustics or other physical or technical basics? We consider this to be worthwhile for several reasons. Firstly, working on speech data will increase reliability of results on the basis of awareness of the basic physical facts. This holds true especially for recording speech data. Furthermore, by agreeing on common standards results become comparable. On the other hand, speech data and results also become more valuable to the speech community. Finally, by sticking to standards or recommendations, those with little technical training may also achieve their goals by finding efficient ways of selecting and using appropriate tools.
Therefore the aim of this chapter is to motivate the reader to concern himself with the physical background. In a single chapter it is impossible to cover the large range of possible speech applications or to deal with all important physical aspects that may arise in this context, and further explanations and assistance may be found in the literature recommended at the end of each section.
The remainder of this section introduces the concept of the communication chain from the production of speech to its exploitation. Recording specifications should be as goal-directed as possible, and therefore we distinguish between an ``ideal'' (task-independent) and a ``real-life'' (task-dependent) approach. Additionally, specific requirements are mentioned in the contexts of building speech corpora (Chapter 3) and speech assessment (Chapter 10).
Section 8.2 presents the part of the basic terminology and common notations that a person working on speech data is likely to be confronted with. In Section 8.3 the human elements of the communication chain are investigated; here we start with the question of how a talker or listener can be characterised and how he should be selected for a specific task. Then the requirements on the minimum recording chain are presented in Section 8.4: what kind of microphone should be chosen, what are the influences of the recording environment and how we get the speech data, including optional parallel recordings capturing fundamental frequency, physiological data or mimics and gestures. Being more speaker-oriented, Section 8.5 deals with the conditioning of a speaker in a natural or artificial auditory and visual environment . In Section 8.6 and 8.7 the technical elements of the recording chain are investigated once more, focussing on linear or non-linear distortions and on reproducibility assurance procedures, respectively. The final section presents some tools for further task-specific processing of speech data, aiming at signal analysis, measurement, and conditioning.