next up previous contents index
Next: Interaction and control Up: Data collection dimensions Previous: Visible recordings


Environment: Studio vs. on location


Recording in a studio


Most of the older speech corpora have been recorded in sound studios. Studio recordings have the drawback that most subjects do not feel at home in that environment , with all possible impacts on their speech behaviour. However, as long as the speech to be elicited consists of lists of words, words embedded in carrier  phrases, and the like, the ``abnormality'' due to the unusual reading tasks may outweigh the contribution of the unusual situation.

Studio recordings have the advantage of superior signal-to-noise  ratio, thanks to the controlled acoustic environment  and - at least as importantly - the possibility to monitor recording levels closely, distance of the speaker to the microphone , use of superior but volatile condenser microphones , for example.

One must be aware, however, that ``studio'' is not a well defined concept. Not all rooms called studio have good acoustic properties. It is not at all unusual to find rooms which have indeed relatively low ambient sound levels, but at the cost of very long or extremely short reverberation times . If studios are used to record large corpora, room acoustics calibration  data should be provided with the speech recordings (see Chapter 8 for details on the calibration procedure).

Many (small) corpora designed for basic speech research have been recorded in rooms which were not designed as an audio studio. This is the case with most simultaneous recordings of speech and EMG signals , which are typically made in the research labs of hospitals. Only rarely have these rooms been prepared to provide acceptable room acoustics.

For many applications high quality speech recordings from one speaker at a time are required. These recordings should be free of background noise , including noises  made by the speakers themselves. The following guidelines apply specifically to the common situation that speakers are recorded one at a time in a sound studio.



  1. Give the speaker ample time to get accustomed to the studio . Explain the recording procedure in general terms.
  2. Start recording sessions with a number of practice items to enable speakers to get going.
  3. At the end of a long recording session, speakers can become hoarse.  This can be prevented by taking a sufficient number of breaks in which the speaker can drink some water, or by splitting the long session up into smaller sessions.
  4. Reduce background noises  produced by speakers (e.g. moving their chair, coughing, tapping with their fingers or their feet, turning text pages, etc.) by

Recording on location

Corpora recorded in the field  have the advantage that the speaker is acting in an ecologically realistic environment . In most cases the price to be paid for this advantage is a substantial loss in signal-to-noise  ratio, either because of high ambient noise  levels, or limited possibility of monitoring recording levels, distance to microphone  , or both. If ecological reality dictates recordings in the field  one should nevertheless plan for conditions which allow an audio engineer to monitor the procedure.

Complete ecological validity may not be feasible. For instance, recording a speech corpus in a running car cannot safely be accomplished if the speaker is in the driver's seat.

Two important classes of recordings on location are recordings

Recording speech in actual applications (which are based on speech input) is one obvious way to obtain ``realistic'' speech data. At least in some countries it is not legally required to advise the user of the service that his speech is being recorded, as long as the recordings are only used for research purposes internal to the company which runs the service. This procedure is probably most often used in pilot versions of an application, where the number of users is limited, so that one may realistically hope to be able to process all recorded speech. Such recordings are necessary to systematically evaluate the success or failure of the speech input parts of an application by relating the speech recordings to the log of the use of the application.

In any case, continued recordings in an application are an efficient means of collecting large amounts of speech data relevant to a given task. Whenever possible, such recordings should thus be made.

Recording speech on the telephone (preferably digital, i.e. ISDN ) is suitable for the gathering of limited amounts of speech material from a large number of speakers (POLYPHONE , [Damhuis et al. (1994)]).

A possible drawback of telephone recordings is the limited bandwidth  of the speech signal, typically between 300Hz and 3000Hz, which may pose problems for some kinds of basic speech research. For example, the absence of low-frequency components prevents a proper pitch  analysis of the recorded speech using methods which rely on the presence of the fundamental frequency  (frequency-domain methods, such as the ``harmonic sieve'', may yield satisfactory results). The absence of high-frequency components prevents for instance the proper spectral analysis  of consonants, especially fricatives . Apart from the limited bandwidth  of the speech signal, telephone channels  can also give a substantial loss in signal-to-noise  ratio, especially in the non-western countries where digital telephone systems are not yet commonplace. Even in modern digital telephone networks signal-to-noise  ratio suffers from the limited dynamic range  that can be accommodated with 8 bit A-law  (or tex2html_wrap_inline44831-law ) coded samples.

If these drawbacks are not important for the research goal, recording of telephone speech appears to be a simple way to collect a large amount of speech data in a very short time. For some applications, such as the training  and testing  of a telephone speech recogniser , a speech corpus with telephone recordings is of course indispensable. It should be emphasised that telephone speech is suitable for many linguistic research projects, including research into most aspects of dialects  and regional language variants as well as all aspects related to spoken language syntax  and vocabulary. 


  1. Recordings in an application are well suited for the collection of large amounts of realistic speech data.
  2. For each recording in an application, log all additional information, especially system reactions, e.g. success and failure, to provide a basis for performance and fault analysis.
  3. For telephone recordings, use digital equipment (ISDN) .


next up previous contents index
Next: Interaction and control Up: Data collection dimensions Previous: Visible recordings

EAGLES SWLG SoftEdition, May 1997. Get the book...