Analog and digital speech signal representation

Next: Talker / listener descriptors Up: Basic notations and terminology Previous: Phones and sones

Analog and digital speech signal representation

Today most signal processing and transmission techniques are carried out in the digital signal domain, with the advantages of greater robustness compared to the analog domain and of the flexible (programmable) and reliable use of digital computers and associated digital hardware to arbitrary accuracy. Therefore recommended that digital data should be used wherever possible. In many cases it is necessary to switch into the digital domain, for instance for storing speech data on Digital Audio Tapes (DAT's) or CD-ROMs, or for further processing on computers. In order to understand the characteristics and limits of digital signal representation, the basic concepts of sampling and quantisation must be understood.

Sampling

An analog signal is continuous in both time and amplitude. The transition to a time-discrete but amplitude-continuous signal is performed by the sampling process: by taking one amplitude value (or sample) every seconds the original waveform is converted into a train of pulses. This signal representation is called Pulse Amplitude Modulation (PAM), and all coding methods that try to reconstruct this pulse train are called waveform coding.

stands for the sampling interval , and stands for the sampling rate or sampling frequency. For choosing T without any loss of information, the sampling theorem has to be borne in mind: a band-limited analog signal may be represented by time-discrete sampling values at constant time intervals without any information loss if , with sampling rate . This is only defined for low-pass signals below a specified cut-off frequency, with spectrum for ) This means that the highest frequency component in the analog signal to be sampled has to be lower than half of the sampling rate . If you are unsure, this has to be guaranteed by low-pass filtering of the signal before starting the sampling process. Otherwise, the analog signal cannot be reconstructed from the samples without severe errors commonly called aliasing.

While PAM offers time-multiplexing capabilities, the pulse amplitudes are still sensitive to noise .

Quantisation and coding

In a second step, quantisation, sample amplitudes are binary coded with a binary word length of w bits per sample, in order to achieve an amplitude-discrete representation (linear Pulse Code Modulation or Lin-PCM). Consequently, the most similar value of possible amplitude values has to be chosen, the difference compared to the original amplitude being the quantisation error. While the bandwidth requirements increase by coding w bits (pulses) per sample, the digital signal is resistent to added noise distortions if the noise does not exceed one quantisation step and if the signal amplitude does not exceed the maximum discrete amplitude range. In order to waste no quantisation steps or bits, the recording level has to be controlled to take advantage, without overload, of the full recording range. The quality of linear PCM is commonly described by the signal-to-noise ratio SNR, referring to signal power and noise power.

In addition to this linear time-invariant coding of the original sampled signal, various modifications have been proposed to take full advantage of the long-time or short-time characteristics of the speech signal [Rabiner & Schafer (1978)]. One method of these is logarithmic PCM (A-law or -law Log-PCM), which uses a higher quantiser resolution at small signal amplitudes and larger quantisation steps at high amplitudes. Another improvement can be achieved by permanently adapting the range of the quantiser to the short-time signal amplitude. A different category of so-called parametric coding strategies applies assumptions about the speech production process within an ``intelligent'' speech coder, thereby shifting the costs from the transmission line (where very low bit rates can be achieved) to the signal analysis and synthesis stage.

For further reading consult [O'Shaughnessy (1987), Rabiner & Schafer (1978), Pierce (1991)].

Next: Talker / listener descriptors Up: Basic notations and terminology Previous: Phones and sones

EAGLES SWLG SoftEdition, May 1997. Get the book...