Microphones and speech recordings

Next: Parallel recordings Up: Transducer characteristics Previous: Transducer characteristics

Microphones and speech recordings

Microphones

The requirements concerned with the choice of the right microphone for a given application can be summarised as follows:

electroacoustic performance, such as sensitivity, frequency range, transient response, non-linearities ;
mechanical characteristics, such as size, robustness, etc.;
electrical characteristics in view of compatibility with other equipment;
insensitivity to external influences such as shock, vibration, electric and magnetic fields, wind;
cost, handling, and other external aspects.

For speech recording purposes under laboratory conditions, the requirements to be focussed on are the flattest possible frequency response and a specified type of directivity to be as constant as possible over the intended frequency range.

Conversion principles

Basically, there are two different physical effects most microphones use to convert acoustic energy into electric energy. Consequently, there are two major groups most microphones can be categorised into, depending on their functional principle.

1. Dynamic microphones

Dynamic microphones use a constant magnetic field to induce voltage in a moving coil mechanically coupled to the diaphragm. Since the output voltage of the microphone is directly generated by the conversion process, no external power supplies are required. Dynamic microphones are quite robust and may be exposed even to high sound pressure levels, which makes them suited for close-talking applications, for example in headsets . The major disadvantage of the dynamic operation principle is that in addition to the diaphragm the comparably heavy moving coil also has to be moved by the sound pressure, resulting in a poorer transient response of the microphone. For this reason dynamic microphones are, with some exceptions, rarely used as top quality studio microphones.

2. Condenser microphones

Condenser microphones basically consist of a capacitor, one of the electrodes of which is formed by a conductive membrane. This membrane is exposed to the incident sound and, when moved back and forth by the sound pressure, slightly changes the capacitance of the capacitor. When the load on the capacitor is kept constant the capacitance changes will, for the voltage across the electrodes, follow the movements of the membrane as long as the voltage changes are small compared to the total voltage across the electrodes. Since the membrane can be manufactured from very thin plastic film material with a conductive layer of vaporised gold or aluminium, it will follow the sound pressure quite exactly and the signal produced by the microphone will be a rather precise reproduction of the original course of the sound pressure. For high-quality studio recordings most microphones used are condenser microphones.

Since the output impedance of the condenser microphone is high, all condenser microphones contain an impedance converter to render an output impedance of approximately 200Ohm. Therefore condenser type microphones need some kind of power supply not only for the impedance converter but also for the polarisation voltage across the electrodes prescribed by the operation principle. The usual way of supplying condenser microphones besides batteries is the use of a so-called phantom power supply which is connected to the output terminals of the microphone. The standard phantom power supply voltage is 48 V DC. To avoid DC offset on the speech signal, most studio microphones include an integrated optional highpass filter with passband beginning at a frequency slightly above 50Hz.

Directional characteristics

1. Omnidirectional microphones

The omnidirectional microphone is sensitive to sound without regard to the direction of the incidence. Thus it will pick up the wanted sound produced by the speaker as well as unwanted background noise . This feature makes an omnidirectional microphone a bad choice when unwanted noise sources are to be expected. On the other hand, it is the most simple type of microphone from the viewpoint of microphone design. As a matter of fact, omnidirectional microphones are the most natural microphones available since the least design compromises have to be made. Thus, omnidirectional microphones are the best choice for high-quality speech recordings as long as the ambient noise floor can be kept low. In addition, omnidirectional microphones do not exhibit the proximity effect . The proximity effect will be dealt with when considering unidirectional microphones.

2. Unidirectional microphones

The unidirectional type of microphone is most sensitive to sound arriving from one direction and more or less attenuates incident sound from other directions. Thus, unidirectional microphones will suppress intended sound when pointed at the wanted sound source, i.e. the speaker.

The construction of unidirectional microphones requires additional engineering effort if a flat frequency response is desired. This is due to the fact that unidirectional microphones respond to the pressure gradient of the sound field, which is frequency dependent. To compensate this dependence, additional tuning, either acoustic or electric, is required in order to yield a flat frequency response.

Moreover, unidirectional microphones show the so-called proximity effect . This effect occurs when spatially confined sound sources are to be picked up. The sound field of small sound sources may be approximated by spherical waves. The pressure gradient in a spherical wave is greater than the pressure gradient in a plane wave by a factor g:

where r denotes the distance between speaker and microphone, f is the frequency , and c the velocity of sound.

When r decreases, the second term in the equation increases and adds a frequency dependent component to the pressure gradient. Since the unidirectional microphone responds to the pressure gradient of the sound field, this behaviour yields a boosted bass response of the microphone at close talking distances which is termed proximity effect.

The proximity effect is generally unwanted except when recording musical instruments or vocalists, so that the increased bass response has to be compensated for by special microphone design with switchable bass-cut filters. In any case, the proximity puts constraints on the recording setup since it requires the speaker-microphone distance to be fixed when sound coloration is intolerable. The influence of the proximity effect decreases sufficiently when the talking distance is great enough, but this results in a decrease of sound pressure level which in turn has to be compensated for with additional gain at the microphone preamplifier, yielding a higher noise level.

There are several kinds of unidirectional microphone which are classified by the shape of their polar responses (Figure 8.2):

Figure 8.2: Typical polar patterns of various types of unidirectional microphone

Cardioid microphones show best ambient noise suppression for incident sound from the back. Sensitivity loss is about 6db at the sides of the microphone and 15-25db at the rear.
Supercardioid microphones are least sensitive at 125 degrees off-axis, 8.7db down at the sides and approximately 15db down at the rear.
Hypercardioid microphones are least sensitive at 110 degrees off-axis, 12db down at the sides and approximately 6db down at the rear.

Typical applications for these types of microphones with respect to noise suppression are given below:

Cardioid microphones should be used when maximum attenuation is needed at the rear of the microphone.
Supercardioid microphones should be used when a maximum difference between the front and the back-hemisphere is needed.
Hypercardioid microphones should be used when maximum side rejection and the maximum rejection of reverberation and background noise is needed. Note that hypercardioid microphones show the greatest random energy efficiency, i.e. the greatest rejection of random-incidence sound.

3. Bidirectional microphones (figure-of-eight characteristics)

Bidirectional microphones are most sensitive at the front and at the rear. There is a plane of minimum sensitivity perpendicular to the direction of maximum sensitivity. This behaviour makes it most suited for the recording of more than one speaker. Bidirectional microphones should not be used to produce speech recordings from one speaker.

The bidirectional microphone also exhibits the proximity effect. The effect is approximately 6db stronger as compared to cardioid microphones.

4. Ultradirectional microphones (shotgun)

The ultradirectional microphone is designed for distant pickup, e.g. in film or TV productions. It strongly attenuates off-axis sound by means of multipath interference at a long slotted tube mounted in front of a unidirectional microphone . Compared to omni- and unidirectional microphones the sound quality is relatively poor since it has been traded against good directivity. The ultradirectional microphone is not recommended for high-quality speech recordings.

5. Pressure zone microphones

A pressure zone microphone basically consists of an omnidirectional microphone mounted close to or into a boundary surface. The distance to the surface is significantly shorter than the wavelength given by the highest frequency to be picked up. Thus, the incident and the reflected sound will always interfere constructively, i.e. there are no comb filter distortions with this type of microphone. The directional characteristic of a PZM

is basically spheroid but the pickup range is limited by the boundary surface to a semisphere. The PZM microphone is recommended for recording situations in which the talker has to sit at a table.

6. Headsets

The use of a headset microphone is recommended in all situations where a high ambient noise rejection is needed. The noise rejection properties are mainly due to the extremely close talking distance which allows preamplifier gain to be greatly reduced. Additional noise rejection can be achieved by choosing microphone capsules with directional properties. The good noise rejection behaviour has to be traded off by a degraded frequency response at low frequencies, which leads to an effect we already referred to as proximity effect (see page

Recording environment

As already mentioned, a specific recording environment (see also Chapter 4) is either intended or not, depending on the underlying purpose the recording is to be made for. In the latter case, the environment itself as well as any physical feedback to the talker should be virtually non-existent with respect to the actual speech signal, i.e. acoustic feedback like noise , dialogues, or on-line instructions by the recording supervisor has to be conducted via headphones. It is necessary to control environmental conditions by avoiding any undesired room acoustics. Since then the talker has been deprived of his natural acoustic environment, with negative psycho-acoustic effects, some effort must be spent in making up for this (see Section 8.5.2).

For some purposes, e.g. basic phonetic research, when environmental impact on the talking subject is of no or little concern, the efforts can be limited to providing an appropriate ``quiet'' recording ambience (environment). For this, the number of other objects in the recording room apart from the talker himself (e.g. cameras, monitors, amplifiers, etc.), if they cannot be avoided at all, should be as small as possible. The objects should be kept as far away from the microphone as possible and, ideally, should be covered by acoustically absorbent material in order to keep unwanted and unreproducible reflections to a minimum. Furthermore, attention must be paid to the choice of the recording room itself.

Small room acoustics

For the evaluation of recording spaces for high-quality speech recordings it is necessary to deal with some basic room acoustic properties. Since only few recordings are going to be made in large rooms such as concert halls, it is appropriate to deal with the acoustics of small rooms.

The distinction between large room acoustics and small room acoustics is necessary since it must be expected that the acoustic properties of a room vary substantially if its size becomes comparable to the wavelength () of sound in the audible frequency range. The latter usually holds true for relatively small rooms such as those normally used for the production of speech recordings.

It is useful to analyse possible problems by looking at the eigenmodes (roughly, resonance properties) in rooms at different frequencies. Figure 8.3 shows that the frequency dependent behaviour of any room may be treated in four frequency ranges, where variable denotes the longest dimension of the room and is given as an empirical equation:

with representing the reverberation time and the volume of the room.

At very low frequencies in region I the physical dimensions of the room are significantly smaller than the wavelength of sound. Thus, wave propagation is impossible in this frequency range and consequently the room acts as a pressure chamber in which the sound pressure does not depend on the probe position.

Figure 8.3: Closed room pressure zones

Region II is dominated by the first eigenmodes of the room, i.e. the wavelengths become comparable to the room dimensions. In this frequency region the acoustic properties of the room are best described by wave acoustics. Problems in this zone may arise due to constructive and destructive interference which will introduce comb filter effects when viewed in the frequency domain.

That is, when a sound source radiates sound in the frequency range given by region II, the sound pressure level that can be measured at different locations will extremely depend on the mode distribution in the room. At a fixed microphone position the measurable sound pressure level for a given frequency will depend on whether the standing waves will interfere constructively or destructively at that location.

Thus, in general, the acoustic transfer function between the sound source and the microphone position will not be flat but influenced by comb filter structures as depicted in Figure 8.4.

Figure 8.4: Typical comb filter structure

In large rooms, such as lecture or concert halls, frequency region II will lie well below the relevant frequency range for speech. This is not the case for rather small rooms, such as those often used for speech recordings. In such rooms, region II will often lie well within the speech frequency range, so that these rooms will need a large amount of well-designed acoustic treatment to be usable for the desired purpose.

In particular, the concept of reverberation time, known as a helpful measure from large room acoustics, will fail since the density of eigenmodes is not large enough and each mode has its own separable decay time.

Region III determines a kind of transition behaviour of the room and is dominated by diffraction and diffusion. The rules of wave acoustics have still to be considered, and when approaching the border to region IV, the rules of large room and ray acoustics begin to become valid.

In region IV the wavelength of sound is substantially shorter than the room dimensions so that ray acoustics is a good tool for describing the behaviour of the room.

Recording rooms

1. Laboratory room

Speech recordings in typical laboratory environments are sometimes made in a kind of workbench situation when no special recording facility is available.

Recordings made in a laboratory environment are often used to test speech recognition systems, as lab speech recordings seem to reflect best natural speech recognition situations, without requiring too much effort concerning the recording setup.

For standardisation purposes, however, the acoustic environment of a laboratory room is worst suited. Particularly when the recordings are made using a speaker sitting at a desk with the microphone being placed on the desk, the setup will lead to strong destructive interference due to reflections from the table surface.

In the frequency domain, this interference produces comb filter structures as shown in Figure 8.4, which will lead to periodic dips in the spectrum of the recorded speech signal. The frequencies where the dips can be found are dependent on the path difference of the direct and reflected sound and will strongly vary the sound coloration of the recorded speech signal when the speaker moves relative to the microphone or the table.

2. Soundproof booth

A sound-insulated and acoustically treated booth or small chamber is often used in clinical audiometry or in psycho-acoustic experiments. The advantage of this kind of equipment is that it is comparably inexpensive and may easily be standardised.

The kind of environment this equipment provides is, however, not recommended for high quality speech recordings for scientific purposes, since small rooms exhibit strong eigenmodes at relatively high frequencies which may lie well within the speech frequency region. Due to the small dimensions of the booth the acoustic treatment of the inner surface will generally not suffice to provide enough absorption for the resonances to disappear.

As a consequence, speech recordings produced in this environment will exhibit strong linear distortions , i.e. sound coloration.

3. Recording studio

Speech recordings may be made in a professional recording studio. The advantage of this type of recording environment is that it is widely available and that the recording location may be rented only for the recording sessions. This will reduce cost, as the fact that the financial effort for the acoustic treatment of the studio will be restricted to the hiring fees.

The major disadvantage of using a recording studio is that the recording conditions and especially the acoustic conditions are not standardised in any way. Moreover, it will generally not be possible to design the acoustic environment of the recording room according to the needs of speech recordings.

4. Anechoic chamber

The use of anechoic chambers for speech recordings is recommended from the acoustic point of view since it exhibits well defined acoustic properties. The almost total lack of wall reflections above a critical frequency, depending on the depth of the absorptive lining of the walls, renders the best approximation to free-field conditions in a noise-insulated environment.

The presence of free-field conditions is especially important with respect to the freedom of choice of the proper microphone to be used for recording. In most of the other recording environments discussed, the type of microphone to be used is largely influenced by the properties of the room, e.g. to suppress ambient noise or wall reflections and reverberation. For example, if a studio microphone with selectable directional properties is placed in an anechoic chamber, the sound of the recording does not depend on the selected directivity of the microphone.

Of comparable importance is the fact that the distance of the microphone relative to the speaker is least influential in an anechoic chamber since the microphone is always in the direct sound field of the speaker, and changing the distance only results in changes of the microphone output level as long as the proximity effect is negligible for pressure-gradient microphones.

Problems in the anechoic chamber may arise when a natural talker's response is to be elicited, e.g. in a dialogue situation, and when inexperienced speakers are used. These problems may arise due to the more or less unnatural perceptual effect which the anechoic chamber imposes on the subjects. For this reason, an appropriate form of acoustic feedback to the speaker that gives a natural room impression is highly recommended, especially for lengthy and psycho-acoustically sensitive recordings (see Section 8.5.2).

Recording chain: studio vs. on-site

Studio

The subject of this section is the minimum recording chain, i.e. the minimum number of mutually connected components that technically transduce the acoustic speech signal into a sequence of 16 bit numbers stored on digital memory media. As depicted in Figure 8.5, this basically comprises the microphone itself, the preamplifier, the transmission line, and finally, the sampling device.

Figure 8.5: The minimum recording chain

For high-quality speech recordings, the overall noise figure, i.e. the signal-to-noise ratio (SNR) of the setup has to be taken into special consideration. Assuming normal vocal effort and a talking distance of 30cm, the SPL at the microphone capsule should rise to a level of about 75db. If the recording takes place in an anechoic chamber , the ambient noise might level to about 20 dB-SPL such that the SNR at the front end of the recording chain equals to 55db. All subsequent technical devices should be designed and connected to each other in such a way that this input SNR is degraded as little as possible. For a detailed discussion of noise figures and related terms and topics, please refer to Section 8.7.

1. Pop noise

Microphones used at close talking distances should be sufficiently protected against pop noise, generated for instance during the articulation of plosives. This is usually accomplished by either external or internal wind shields. Susceptibility to pop noises is strongly dependent on the microphone position.

The microphone should be removed from regions where considerable air flow is to be expected during articulation. A reasonable measure is to situate the microphone about 15 degrees off the direct talking axis.

2. Microphone

The internal noise of a microphone is usually given as the equivalent acoustic input noise level rendering the same output voltage. High-quality studio microphones show equivalent input noise (EIN) around 20db-SPL, comparable to the ambient noise level in very quiet (sound-insulated) rooms. If added to the ambient noise of 20db-SPL for such an anechoic chamber , this reduces the output SNR to about 52db. The equivalent noise level should be mentioned in the manufacturer's specification sheet.

3. Microphone preamplifier

The microphone preamplifier should meet standards for studio equipment. The minimal requirements for speech recordings are a flat frequency response, low distortion , low noise level, sufficient gain and a linear phase response.

The first two requirements are met by most preamplifiers built according to modern technology. The most serious problems occur in making a compromise between high gain and low noise . In general, noise generated within the preamplifier should not worsen the signal-to-noise ratio given by the EIN of the microphone. The input noise of high-quality microphone preamplifiers should be less than -125dBu at 200Ohm input impedance (dBu reference voltage: a0dBu = 0.775 or 1mW at 600Ohm), which roughly corresponds to the thermal noise of a 200Ohm resistor.

Usually, microphone preamplifiers allow the gain to be tuned from 0 to 60db which is sufficient for microphone distances of about maximally 30cm at a reasonably low noise level. Greater microphone distances, e.g. 50-60cm, which may occur when the speech signal is picked up by a PZM microphone placed on a table in front of the talker, require amplifier gain in excess of 60db which may result in audible noise during pauses.

4. Wiring and transmission lines

The requirement for absolutely correct wiring, i.e. sufficient shielding and grounding of the recording chain cannot be over-emphasised: this means connecting all components to a single solid ground. To avoid any unwanted induction into the transmission lines, these should possess a full-mantle shielding that has also been properly connected to the same ground (Figure 8.5). It is advisable to keep all lines as short as possible as well as to keep them away from any other electrical equipment.

It is standard in the high-quality speech-recording area to use balanced systems, i.e. to feed the speech signal into the recording chain along with its negative (180 degree phase shifted) counterpart (Figure 8.6).

Figure 8.6: Noise cancellation on balanced microphone lines

Since both conductors in a balanced system pick up the same stray signal, noise that has eventually been induced to the system along the feedway can be cancelled out by the summation of the once more inverted signal with its unshifted double.

On-site

As stated previously, everything valid for the recording chain in a studio environment in principle holds also true for on-site recordings. A major difference, however, is that in real-life recordings, additional stages may be inserted into the recording chain which exhibit more or less unknown physical properties.

Telephone recordings

A very common method of speech data sampling is via the telephone. Hthat ere, only very general statements can be made on the quality of the microphone itself, not to mention the telephone network and the possibly intervening radio links when mobile phones are in use, or when communication takes place via satellite.

Recommendations on how to use a telephone can not be given. From the technical point of view, however, it has to be mentioned that the speech signal arriving at the receiving telephone has to be by-passed, sampled, and stored at some point prior to the acoustic output, i.e. it should never be captured by a microphone recording of the speech signal emitted from the telephone earpiece.

A coarse distinction between telephone networks can be made in terms of whether they use the analog or the digital signal domain . Whenever the operator has the choice, he should use digital telephone networks (in EU-Europe ISDN-network ). This guarantees best possible signal quality in terms of noise and distortion . At the same time he must be aware of the fact that telephone networks may not be homogeneous in this respect, even within the same network.

Furthermore, the attention of operators must be directed to certain drawbacks of recording speech via the telephone:

Frequency range is limited between 300Hz and 3400Hz in contrast to a natural speech frequency range of about 75Hz to 8000Hz. In consequence, this prevents proper pitch evaluation as well as sufficient spectral analysis of high-frequency components such as those associated with fricatives, for example.
In digital telephone networks speech dynamics are degraded to a limit of 42db-SPL due to an 8 bit A-law coding (ISDN) .
Transmission properties change from network to network.
Overseas communication is commonly transmitted via satellite. This potentially adds additional non-linear distortions (e.g. echoes) to the speech signal.

A discussion on whether telephone recordings are suitable for a specific purpose or not may be found in Chapters 3 and 4. Details of what kinds of distortion are imposed on a speech signal in a telephone network, and how to get a figure of their magnitude, are given in Section 8.6.

Data collection

For studio recordings, the data collection stage comprises the A/D-conversion of the analog audio signal and its storage on permanent memory media. We strongly recommend using digital data storage in general, and a hard disk directly connected to the sampling device (computer) in particular.

If all phonetically relevant information in a speech signal is spectrally restricted to a frequency range from 0 to 8.000Hz, the standard sampling frequency of A/D-converters for speech recording purposes, following the sampling theorem, is 16kHz. Appropriate off-the-shelf equipment for speech sampling in real-time should be available for all current computer systems. These would include all filters necessary for proper preprocessing of the analog speech signal according to the sampling theorems. Attention has to be paid to the filters involved: these must be designed to be strictly linear in order to avoid unacceptable phase distortions.

The standard format of speech data is SHORT (16 bits, signed, linear) which corresponds to a representable value range of -32768 up to +32767, i.e. maximum recording dynamics of 96db-SPL. With a properly calibrated microphone preamplifier at the front end, this should suffice for a peak factor in the recording session as well as the projected SNR of about 50db at the microphone output.

Alternatively, a DAT (Digital Audio Tape) may be used to store the speech data. The standard sampling frequency is 48kHz with a 16 bit resolution. This poses less strict requirements in view of the linearity of the filters involved. On the other hand it is rather cumbersome to access recordings made by a DAT for further processing.

When speech has been collected via a digital telephone network it might be necessary to resample the incoming signal according to the required sampling frequency of the recording station. On a digital recording device this is easily achieved by standard algorithms; if a DAT is used to record the digital signal , proper D/A-conversion is necessary. The easiest way to control the domain of the speech data (analog/digital) is to put signal extraction at a position in the receiving telephone that gives access to the data in either analog or digital form. Note in particular that the ISDN signal is encoded as A-law .

Recording procedure

The recording procedure comprises a whole range of measures, beginning with the calibration of the microphone and ending with the design of proper interaction between the talking subject and the recording manager. A detailed description of various aspects of concern with regard to the recording procedure is presented in Chapter 4.

From a technical point of view, however, the calibration and the positioning of the microphone is of central interest. It goes without saying that calibration is to be omitted in on-site situations like telephone recordings, for example.

Recommendations on microphones and speech recordings

For each of the preceding subsections we give a separate paragraph of recommendations:

Recommendations on microphones

The choice of the right microphone strongly depends on the specific task to be performed. In on-site recording situations, often no decision on the microphone can be made. With respect to the best quality obtainable, however, we can give the following recommendations:

Always choose professional equipment for speech recordings, especially when it comes to microphones.
For the flattest possible sensitivity response over the entire frequency range of speech pick a condenser microphone.
In very quiet environments choose a microphone that exhibits omnidirectional directivity characteristics. They are the easiest to use with regard to position and orientation.
In reverberant and/or noisy environments a unidirectional cardioid microphone represents a reasonable compromise between noise suppression and flexibility in handling. It eliminates perturbation signals arriving from off-axis directions greater than 65 degrees (-3db).
Recordings in a car should be performed with unidirectional microphones presenting a sensitivity response with considerable magnitude attenuation at frequencies below 500Hz. This is due to the fact that most of the acoustic energy emitted by a car or truck originates from the 100-300Hz band. Microphones of this type are especially designed for hands-free mobile telephones in cars.
If a fixed position of the microphone with respect to the speaker's head as well as high ambient noise suppression is crucial, use a headset.

Recommendations on the recording environment

In order to achieve speech recordings with minimum environmental (room) distortions the following recommendations should be followed:

If available, recording should take place in an anechoic chamber .
Placing equipment in the recording room should avoided as far as possible; if it is unavoidable, place equipment as far away from the microphone as possible and, ideally, cover it with acoustically absorbent material.
A direct feedback path of first-order reflections between mouth-manuscript-microphone should be avoided.
If negative effects on the talker's prosody , due to missing acoustic room information, are a cause for concern, proper room simulation via headphones is recommended (refer to Section 8.5.2).

Recommendations on the recording chain: studio

In view of the recording chain we may give the following recommendations:

As a general guideline, always utilise professional equipment.
To avoid pop noise have the microphone properly wind shielded.
If condenser microphones are used, activate the built-in high-pass to suppress potential offsets induced by the phantom-power supply.
The microphone preamplifier should fulfil studio standards, with a noise figure less than -125dBu at 200Ohm, gain range 0-60db.
Transmission lines must be properly balanced, properly grounded and shielded, and short.

Recommendations on the recording chain: on-site

Where the operator has control over components in the recording chain, the recommendations in the preceding section hold true. The field of on-site recordings is quite literally wide open, so that recommendations must be restricted to the very common case of data collection via telephone:

Do not record speech from the loudspeaker of the receiving telephone. Instead, by-pass the signal at some point prior to the audio stage in the phone.
If there is a choice, always use digital telephone transmission (such as ISDN) for best possible signal quality. Keep in mind that conditions may not be homogeneous, even within the same network.
Be aware of certain limits in speech data collected by telephone concerning the obtainable SNR, dynamics and the bandwidth of the signal.

Recommendations on data collection

Only utilise professional equipment when it comes to data collection.
Use digital data storage media, ideally a computer hard disk.
The standard data format is: sampling rates 16/22.5/32/44.1/48kHz, 16 bit, linear PCM . Notice the simple conversion ratios for digital signal processing for the sampling rates 16/32/48kHz and 22.5/44.1kHz.
Use computer driven sampling devices. They provide best possible data handling, especially in comparison to DAT recordings.
Filters involved in the preprocessing of the analog speech signal must be designed to be strictly linear in order to avoid unacceptable phase distortions.
Digital speech data collected via telephone could be resampled according to the specification of the sampling device. This can often be circumvented by proper positioning of the signal by-pass in the receiving telephone.

Recommendations on the recording procedure

Microphone calibration : observe the maximum level a subject produces during a test phase, and set the amplifier gain such that the observed peak level is about 12db below the maximum possible recording level.
Microphone positioning (omnidirectional): 20-30cm distance, 90 degrees incident, 15 degrees off-axis.
Microphone positioning (unidirectional): 40-50cm distance, 0-60 degrees incident.
Microphone positioning (pressure zone): 50-60cm distance, situated on a table in front of the talker.
Microphone positioning (headset) : ca. 5cm distance, 20-30 degrees off-axis, at the same level as the lower lips.

Next: Parallel recordings Up: Transducer characteristics Previous: Transducer characteristics

EAGLES SWLG SoftEdition, May 1997. Get the book...