For some applications one must record a number of physiological signals besides the acoustic signal, such as a laryngograph signal, an electromyograph signal (EMG), air pressure or flow in the vocal tract, articulatory parameters , X-ray data, etc. The major drawback of recording such additional signals is that speakers have to be bothered with measuring equipment, such as a strap with electrodes round the neck in the case of laryngographic recordings. One should be aware that the measuring equipment may interfere a natural speech production. Therefore, it is recommended to use additional signal recordings only for basic speech research and for specialised purposes, such as examination of voice pathology, and else confine oneself to the basic acoustic signal. For some applications (audio-visual analyses) it may also be useful to make video recordings. The following are examples of ancillary information channels:
Most of these multi-channel recordings require a high technical effort. The placement of sensors may disturb the speaker (X-rays are even dangerous), and due to the considerable effort involved, only few corpora of these kinds of measurements exist.
In a laryngograph recording a pair of electrodes is attached to the throat of the speaker on each side of the thyroid cartilage (Adam's apple). This sensor produces a signal proportional to the amount of contact between each vocal fold, e.g. during phonation.
Laryngography recordings were taken at the Eurospeech 93 conference
in Berlin, where speakers were recorded during their
presentations. The data are available in the TED corpus from the
Bavarian Archive for Speech Signals (BAS; see Appendix N)
and LIMSI.
REQUIREMENTS: Laryngography sensors, DAT tape recorder or computer interface; 8 bit quantisation approximately 10kHz sample rate. Laryngograph recordings are also included in the EUROM-1 corpus.
Electropalatography registers the contact of the tongue with the hard palate during articulation. The speaker places a customised thin artificial palate in his mouth. This artificial palate contains an array of electrodes which record contact with the tongue.
The data recorded by each electrode is combined to a two-dimensional
representation
of the palate at any given point in time.
REQUIREMENTS: Artificial palate individually tailored to a speaker, multi-channel recording device, e.g. computer with a suitable interface; 64 bit quantisation (i.e. typically an 8 8 array), sample rate 200Hz.
Electromagnetic articulography (EMA) measures the movement of the tongue and other articulators through tiny induction coils attached to the tongue. The head of the speaker is enclosed by a helmet which usually holds two (or more) coils that create an electromagnetic field; The signal induced in the coils on the tongue is proportional to the distance from the transmitter coils on the helmet.
The EMA provides essentially the same kind of data (for parts of the vocal
tract only, because coils cannot be placed on the larynx) as the
microbeam X-ray (see Chapter 8) but uses a different technology.
REQUIREMENTS: Articulograph, multi-channel recording device, e.g. a computer with a suitable interface; data rate depends on the quantisation , the number of sensors, the number of transmitters (typically 10 sensors and 3 transmitters), and the sample rate (typically 250Hz).
X-ray measurements are rarely performed today because of the health hazards they impose on the speaker. However, early recordings are still available on film or, in digital format, on laser disk (Bateson at ATR, Japan).
X-ray measurements show the modification of the articulatory tract during
articulation.
The movement of the jaw can be seen clearly; tongue and lip movement are
often less
clear due to the fact that they do not show up very clearly on X-ray. The
movement
of the vocal folds is too fast to be recorded at the slow frame rate of film
recordings.
REQUIREMENTS: Seldom performed.
In air-flow measurements the speaker wears a mask (usually designed to separate oral and nasal airflow). Flow is usually derived from the pressure drop across a wire-mesh located in a flow head mounted in the mask.
The measurements yield data on the speed, direction, and volume of air flow.
Depending on the type of sensor and attachment, the measurement requires
that the
speaker does not move during articulation.
REQUIREMENTS: Air flow sensors, data acquisition hardware. The data rate depends on whether phonatory components of airflow need to be captured.
X-ray microbeam provides two-dimensional movement data (usually in
the mid-sagittal plane)
of selected fleshpoints on the tongue and other articulators.
It uses a point-tracking technique to reduce the radiation exposure
to the subject to acceptable levels.
REQUIREMENTS: The equipment is only available at the dedicated microbeam facility in Madison, Wisconsin. Data rate: each fleshpoint is tracked at about 100-200Hz. Typically about 10 fleshpoints are tracked simultaneously.
Nuclear magnetic resonance imaging is a static (up to now) imaging technique
with very good resolution of the soft tissues in the vocal tract. Slices
can be freely chosen i.e sagittal, coronal, etc.
REQUIREMENTS: a friendly hospital; sample rate < 1Hz, image resolution 256 256 pixels (typical, with 8 bits pixel depth)
Ultrasound imaging can be used for obtaining sagittal and coronal images of
the tongue(for those locations on the tongue where no air intervenes between
transducer and tongue; the transducer is usually held under, and moves with the
jaw).
REQUIREMENTS: Ultrasound machine. The data is usually stored as standard video data. A frame-grabber is needed if data is to be digitised.