next up previous contents index
Next: Speech quantity and variety Up: Influencing factors Previous: Speech quality

Temporal drift

   

The characteristics of a voice vary over time, depending on how tired the speaker is, how stressed  he is, what mood he is in, whether he has a cold or not, etc. Moreover, it was often noted that the behaviour of users changes while they are getting accustomed to a system. These trends can be gathered under the term temporal drift.

Temporal drift usually affects significantly the performance of a speaker recognition system . Intra-speaker variability within a single recording session is usually much smaller than inter-session variability. In practice, performance levels deteriorate significantly a few days, or even a few hours, after registration, as compared to those obtained with contemporaneous speech , i.e. when test utterances  are pronounced immediately after the training phase  is terminated. A partial solution to temporal drift consists in using training material  which is gathered over several sessions: as the collected data are more representative of the intra-speaker variability over time, more robust speaker models can be built. However, this approach makes the registration process heavier.

When the targeted application is intended to operate along time, it is necessary to design an evaluation experiment for which test material  was recorded in several sessions, separated from each other by at least one day, and covering a reasonable time-span (at least a month).gif When multi-session recordings are available, the training material  should be chosen so that it corresponds to the first recording session (or sessions, for multi-session training). Conversely, the material of a given session should never be split between training  and testing ,gif  as this would lead to an unrealistic protocol.

When these constraints are fulfilled, the number of training and test sessions  , and the timespan covered by both phases should be explicated. Note that the number and timespan of training sessions has an influence on performance levels and on the user acceptability, whereas the number and timespan of test sessions have only an impact on the statistical validity of the evaluation results.

 



next up previous contents index
Next: Speech quantity and variety Up: Influencing factors Previous: Speech quality

EAGLES SWLG SoftEdition, May 1997. Get the book...