Typology of applications

Next: Examples of speaker verification Up: Applicationssystems and products Previous: Terminology

Typology of applications

It is unrealistic to review all the applications that could benefit from the knowledge (recognition) of speaker characteristics. The overlap with the applications of speech technologies could become more and more important in the future. Techniques for speaker recognition can be used for speaker selection or the adaptation of multi-speaker or speaker-independent speech recognition systems, and can bring improvements to speech processing techniques in general, including synthesis and coding.

In Automatic Speech Recognition , it may be important to recognise the accent (regional, foreign ) of the speaker, his speaking rate , his style, his mood, etc. The recogniser itself could be adapted dynamically to the speaker.
In Speech Synthesis, the control of voice characteristics offers the possibility of simulating any speaker, of conveying emotion , etc.
Pronunciation training aids and reeducation software could be adapted to the speaker.
Speech coders could also adapt to the voice of the talker.

In most of these cases, it may be inappropriate to treat separately linguistic and paralinguistic aspects of the speech signal. The paralinguistic aspects are certainly more important in the following tasks:

identification or identity verification
speaker change detection
speech pathology detection and evaluation

Speaker change detection can be used for automatic speaker labelling in recordings or focussing on the current speaker in video conferences. For some multimedia applications, it may be very useful to access, speaker by speaker, the recording of a conversation or of a radio or television program. Here, the speech to be processed is reasonably sequential, which makes the algorithms usually more efficient, but it may happen that several speakers talk at the same time.

Most of the following discussion will focus on speaker verification . In this context, the most obvious dichotomy separates remote (telecom) applications from local (on-site) applications. Remote applications are typically performed over the telephone. A major problem with the telephone is the diversity of telephone sets and channel paths. The microphone and the environment is much better controlled with local applications.

Telecommunication (remote) applications

One of the most obvious uses of speaker recognition techniques is caller authentication over the telephone network. In the framework of such applications, the main task is therefore speaker verification . The user claims his identity, most of the time by dialling (or saying) a personal code number. Then, either a code word is required to authenticate the speaker, or his utterance of the code number is used for verification purposes.

Typical application areas are:

calling cards with home bill charge
banking (checking balance, transfer of funds)
payment by credit card: remote payment by credit card is usually accepted by providing the name of the owner, the card number and the expiration date (all this information is printed on the card)
teleshopping (grant authorisation to transfer money)
stock exchange operations (purchasing, selling stocks)
home incarceration, alcohol rehabilitation program
military applications
access to or modification of information on remote servers (restrict access to authorised users).
access to computing facilities from a remote terminal (coupled with encryption), where passwords are currently used, and alternatives necessitate non-standard equipment (card reader, scanner, electronic pen, encrypted key, ...)

The main applications for speaker verification over the telephone network are of two kinds: the first kind is banking and remote transactions, the second kind is access to licensed databases. It is obvious that both fields do not require the same level of security: it is usually less costly to let someone unauthorised have access to a database than to allow somebody to operate some transaction that can involve large amounts of money. In many countries, it is possible to pay by credit card over the phone, without any other verification of the customer than the consistency between the customer's name, his credit card number and its expiry date (all of them being on the credit card itself!). Some kind of basic speaker verification (for instance user-specific text-dependent verification), even with tolerant thresholds, could certainly bring more security to this type of transaction. Note that, in this case, the speaker characteristics should be centralised in a single place that delivers the transaction authorisations, and would require special equipment for each supplier, but the verification could take place off-line because the confirmation of the transaction does not usually need to be immediate.

On-site (local) applications

With on-site applications, the person whose identity is to be verified must be either physically present in one particular location or in direct contact with a device under the control of the service provider. This offers some freedom for the product designer to choose from a variety of techniques. Keys, badges, codes, ... are used most frequently. However, they may not insure a sufficient level of security. In such cases, biometric verifiers offer an alternative (or a complement).

Typical applications are:

access to or control of equipment,
access to a secured area (nuclear plant, military premises),
voice key (home, car),
mobile telephone, personal assistant (only respond to the voice of his owner),
Automatic Teller Machine (ATM).

These applications are the equivalent of database access and remote transactions over the telephone. However, the differences to telecommunication applications come from four facts:

Environment factors and the signal bandwidth can be more easily controlled.
Automatic verification can send an alarm in case of doubt.
The customer can carry his voice characteristics with him (on an intelligent card, for instance).
The voice verification technique can be associated more easily with additional identity verification (multimodal) techniques.

For example, a possible implementation of a voice verification system for money distributors could simply refuse the transaction (or limitate its amount) if the voice characteristics do not match sufficiently the identity corresponding to the Personal Identification Number (PIN). Note that, even if a thief steals a credit card and the PIN is attached to it, he may still not know the voice of the user, and may have difficulty in getting information about it. If he knows the voice of the owner, he may be unable to imitate it. Even if some false acceptances take place, the amount of fraudulent transactions will be necessarily reduced. However, if too many false rejections occur, the bank may lose some of their clients. The probability of an impostor having a voice similar to the user being small, the system can be quite tolerant and still reduce the number of fraudulous withdrawals, be somehow dissuasive to impostors, without offending a significant subset of the regular users. If additional procedures are put into action, such as taking a picture of the user in case of doubt on his identity, or performing some kind of face recognition, dissuasion is reinforced, since it requires a more elaborate and less casual strategy to be resorted to by a possible impostor. For this kind of application, the voice characteristics of the speaker can be stored on the magnetic tape or the chip of his card, which does not require centralised access. The verification must take place in nearly real-time to avoid undesirable delays in the transaction.

In the context of access control, a reasonable system would be based on rather strict verification, with a ``call for assistance'' procedure in case of rejection .

Next: Examples of speaker verification Up: Applicationssystems and products Previous: Terminology

EAGLES SWLG SoftEdition, May 1997. Get the book...