Until about a century ago, under most circumstances to be within range of the voice was to be within range of the eyes, and the face or silhouette were no doubt much more important to humans than the voice as a means of identifing each other, with the exception of special cases such as the blind. With the development of telecommunications and acoustic recordings, the need for speaker recognition has become more important.
The basis for automatic speaker classification and recognition is that in addition to the linguistic message, the human voice conveys a lot of paralinguistic information about the speaker, i.e. the ``encoder''. These factors of variability are well-known obstacles to speech recognition , as they increase the variability of the speech signal.
The main sources of a speaker's specificity are the physiological configuration of his speech production organs, his neuro-motor control of these organs, and his internal speech pattern prototypes. In practice, there may exist more or less systematic correlations between these factors and some of the speaker's characteristics, such as his sex , age , health conditions, mood, regional, cultural, educational background, possible foreign accent, and the language he is speaking.
In this chapter, we address a class of pattern recognition problems where the goal is to classify a speech pattern according to some characteristics of the speaker who uttered it. We recommend the general term speaker classification to denote such problems.