The characteristics of a voice vary over time, depending on how tired the speaker is, how stressed he is, what mood he is in, whether he has a cold or not, etc. Moreover, it was often noted that the behaviour of users changes while they are getting accustomed to a system. These trends can be gathered under the term temporal drift.
Temporal drift usually affects significantly the performance of a speaker recognition system . Intra-speaker variability within a single recording session is usually much smaller than inter-session variability. In practice, performance levels deteriorate significantly a few days, or even a few hours, after registration, as compared to those obtained with contemporaneous speech , i.e. when test utterances are pronounced immediately after the training phase is terminated. A partial solution to temporal drift consists in using training material which is gathered over several sessions: as the collected data are more representative of the intra-speaker variability over time, more robust speaker models can be built. However, this approach makes the registration process heavier.
When the targeted application is intended to operate along time, it is necessary to design an evaluation experiment for which test material was recorded in several sessions, separated from each other by at least one day, and covering a reasonable time-span (at least a month). When multi-session recordings are available, the training material should be chosen so that it corresponds to the first recording session (or sessions, for multi-session training). Conversely, the material of a given session should never be split between training and testing , as this would lead to an unrealistic protocol.
When these constraints are fulfilled, the number of training and test sessions , and the timespan covered by both phases should be explicated. Note that the number and timespan of training sessions has an influence on performance levels and on the user acceptability, whereas the number and timespan of test sessions have only an impact on the statistical validity of the evaluation results.