Training

Next: Exploitation Up: Speaker verification/identification Previous: Error measure

Training

The following description applies to the speaker verification as well as the speaker identification process. The system may be based on the speaker uttering a sentence or a sequence of words to give some samples of his speech. The comparison during the exploitation phase uses a reference dictionary obtained during the training phase .

The training phase may be off-line or on-line and carried out at:

The technology provider site,
the application developer site,
the customer site.

The training phase may be carried out off-line using a particular platform , or on-line while the application is operating. The application developer has to know whether he can achieve the training himself (or the end-user can do it) or he will have to deliver the data to the technology provider who will provide the speech models.

The training material can be specified by the technology provider as a list of phonetically balanced sentences, well chosen sequences of words, or data selected with respect to some particular criteria (e.g. phonetic coverage of the language). In some cases this material has to be collected and modelled by the application developer. In some other cases it is automatically done during a training session that is seen as a black box procedure. In all cases the technology provider should indicate the size and characteristics of the speech database needed to achieve the required performance.

The system documentation should also indicate the kind of know-how necessary to best exploit the technology if the training is accomplished by the application developer. This may be a list of appropriate phonetically balanced sentences per language if this is required, a tool to generate a minimal set of sentences or words, a selected list of words, etc.

If the training is achieved off-line using a database that has to be recorded beforehand then the application developer has to know what intervention is necessary to obtain a usable corpus. These can be speech segmentation , speech labelling using phonetic labels, orthographic transcriptions , etc. Consequently the application developer should request an adequate development platform with adapted tools such as a speech recording and analysis environment.

Next: Exploitation Up: Speaker verification/identification Previous: Error measure

EAGLES SWLG SoftEdition, May 1997. Get the book...