next up previous contents index
Next: Specifications for speech corpora Up: Introduction Previous: Introduction

The communication chain

   

Due to the different possibilities of producing, processing and exploiting spoken utterances there exists a tremendous variety in the architecture of the so-called communication chain. Therefore we define the communication chain as the connection(s) between a talker and a listener via an auditory, a visual and/or an electric channel . While these are parallel channels   of information flow, the electric channel as well might be seen as serial and/or parallel connections of electric devices and channels. Figure 8.1 may give a rough impression of this somewhat simplified scheme.

 figure11233
Figure 8.1: Scheme of the communication chain 

This scheme consists of the following elements:

  1. We have to recognise the talker (source) and the recording environment  as a whole.
  2. The auditory and visual environmental   factors both have impact on the talker's behaviour as well as on the probes at the sensor's position.
  3. We have to be aware of this talker-environment feedback, although the communication chain suggests a unidirectional talker-to-listener succession.
  4. To collect the talker's data of interest (i.e. speech, lip movements or glottal frequency ), sensors and transducers  are to be applied: the latter converts acoustic and mechanical energy into an electrical signal (like a microphone ) or vice versa (like loudspeakers or headphones ).
  5. If the electrical signal has to be analysed or manipulated, this can be done by signal processing devices like filters, amplifiers, A/D-converters, computers with room simulation algorithms, et cetera.
  6. The processed signal may be played back via loudspeakers, sent to a storage device or transmitted via an information channel  like the telephone line.
  7. The listener makes up the end (sink) of the communication chain.

In this chapter we differentiate between two opposite strategies for the actual specification of a communication chain. The first strategy, called the ideal or flawless approach , tries to capture the speech signal as cleanly as possible in a domain and scenario  independent way. The advantage is that these data may be applied to many tasks with ``average'' suitability, without being ideally adapted to their specific conditions. Another advantage is given by flexibility in exploiting the same data: many post-processing possibilities exist so that many task-specific signal characteristics may be imposed after the recording itself. But the talker's conditions and some environmental  factors are also reflected by the ``clean'' data, and the possibilities of subsequent corrections or manipulations are limited. To yield so-called flawless speech,   we have to consider the dilemma of motivating a natural way of speaking on the one hand and optimising the more technical circumstances of the recording session on the other hand. One has to come to a minimum set of decisions: what kind of speaker (cf. Section 8.3), what kind of auditory and visual environment   (cf. Section 8.5) and how to capture the speech signal in an optimal way (cf. Section 8.4).

The opposite recording strategy may be called a real-life or on-site approach: From the beginning the communication chain is adapted to a specific scenario  as closely as possible. For instance, if a speech recognition  device that makes up a part of an information system for in-car inquiries via a mobile phone is to be evaluated, the speaker has to sit in a moving car, he has to drive the car himself  and the speech data have to be transmitted over the wireless telephone network. As in similar cases, the simulation of the acoustic environment  is not the crucial point, but the situation-dependent speaking style  influences the resulting speech signal significantly. Within this approach we find the dilemma of ensuring real-life conditions on the one hand while performing the recording in an optimal way on the other hand.

 



next up previous contents index
Next: Specifications for speech corpora Up: Introduction Previous: Introduction

EAGLES SWLG SoftEdition, May 1997. Get the book...