next up previous contents index
Next: Talker / listener descriptors Up: Basic notations and terminology Previous: Phones and sones

Analog and digital speech signal representation


Today most signal processing and transmission techniques are carried out in the digital signal domain, with the advantages of greater robustness compared to the analog domain and of the flexible (programmable) and reliable use of digital computers and associated digital hardware to arbitrary accuracy. Therefore recommended that digital data should be used wherever possible. In many cases it is necessary to switch into the digital domain, for instance for storing speech data on Digital Audio Tapes (DAT's)  or CD-ROMs, or for further processing on computers. In order to understand the characteristics and limits of digital signal representation, the basic concepts of sampling and quantisation  must be understood.


    An analog signal is continuous in both time and amplitude. The transition to a time-discrete but amplitude-continuous signal is performed by the sampling process: by taking one amplitude value (or sample) every tex2html_wrap_inline46435 seconds the original waveform is converted into a train of pulses. This signal representation is called Pulse Amplitude Modulation (PAM),   and all coding methods that try to reconstruct this pulse train are called waveform coding. 

tex2html_wrap_inline46435 stands for the sampling interval , and tex2html_wrap_inline46439 stands for the sampling rate or sampling frequency. For choosing T without any loss of information, the sampling theorem has to be borne in mind: a band-limited analog signal may be represented by time-discrete sampling values at constant time intervals tex2html_wrap_inline46441 without any information loss if tex2html_wrap_inline46443, with sampling rate  tex2html_wrap_inline46445. This is only defined for low-pass signals below a specified cut-off frequency, with spectrum tex2html_wrap_inline46447 for tex2html_wrap_inline46449) This means that the highest frequency  component tex2html_wrap_inline46451 in the analog signal to be sampled has to be lower than half of the sampling rate  tex2html_wrap_inline46445. If you are unsure, this has to be guaranteed by low-pass filtering of the signal before starting the sampling process. Otherwise, the analog signal cannot be reconstructed from the samples without severe errors commonly called aliasing.

While PAM  offers time-multiplexing capabilities, the pulse amplitudes are still sensitive to noise .  

Quantisation and coding


In a second step, quantisation, sample amplitudes are binary coded with a binary word length of w bits per sample, in order to achieve an amplitude-discrete representation (linear Pulse Code Modulation or Lin-PCM).   Consequently, the most similar value of tex2html_wrap_inline46457 possible amplitude values has to be chosen, the difference compared to the original amplitude being the quantisation error. While the bandwidth  requirements increase by coding w bits (pulses) per sample, the digital signal is resistent to added noise  distortions  if the noise  does not exceed one quantisation step and if the signal amplitude does not exceed the maximum discrete amplitude range. In order to waste no quantisation steps or bits, the recording level has to be controlled to take advantage, without overload, of the full recording range. The quality of linear PCM  is commonly described by the signal-to-noise ratio SNR,  referring to signal power and noise power.

In addition to this linear time-invariant coding of the original sampled signal, various modifications have been proposed to take full advantage of the long-time or short-time characteristics of the speech signal [Rabiner & Schafer (1978)]. One method of these is logarithmic PCM  (A-law  or tex2html_wrap_inline44831-law  Log-PCM), which uses a higher quantiser resolution at small signal amplitudes and larger quantisation steps at high amplitudes. Another improvement can be achieved by permanently adapting the range of the quantiser to the short-time signal amplitude. A different category of so-called parametric coding strategies  applies assumptions about the speech production process within an ``intelligent'' speech coder, thereby shifting the costs from the transmission line (where very low bit rates can be achieved) to the signal analysis and synthesis stage. 


For further reading consult [O'Shaughnessy (1987), Rabiner & Schafer (1978), Pierce (1991)].

next up previous contents index
Next: Talker / listener descriptors Up: Basic notations and terminology Previous: Phones and sones

EAGLES SWLG SoftEdition, May 1997. Get the book...