Recommendations

Next: Example Up: Influencing factors Previous: Speaker purpose and other

Recommendations

The description of an evaluation experiment or an assessment protocol concerning a speaker classification or recognition system should explicitly report on the following items:

Speech quality
-
the general speech acquisition and transmission characteristics, in particular the signal bandwidth , the nature of the noise , the signal-to-noise ratio (SNR) when measurable, the characteristics of the transmission line, ...
-
the speech quality factors that remain constant across training sessions and test sessions, in particular concerning the environment , the microphone , the channel , ...
Temporal drift
-
the way the speech material is split between training and test material in correlation with the chronological order of its recording; for each speaker, test material should always be posterior to the latest training material .
-
the average number of training sessions that are necessary to register one new speaker; in practice, the number of distinct convocations he has to answer.
-
the average registration timespan elapsed between the first training session and the last training session , for one new speaker.
-
the average number of test sessions per registered user, taken into account in the evaluation.
-
the average operation timespan elapsed between the first test session and the last test session, per registered user, during the evaluation.
Speech quantity and variety
-
the average speech quantity per training session which is used per speaker for one training session, and if relevant, the average percentage of effective training speech quantity, i.e. the proportion of training speech which is actually used to build the registered speaker models.
-
the average speech quantity per test session which is used per speaker for one test session, and if relevant, the average percentage of effective speech quantity per test session, i.e. the proportion of test speech which is actually used to identify or verify the speaker, in test mode .
-
the qualitative description or characterisation of training and test linguistic content.
Speaker population size and typology
-
for speaker identification , the registered speaker population size, i.e. the number of registered users; this figure can also be reported for speaker verification experiments, but only as an indication of the statistical validity of the results.
-
the proportion of male and female registered speakers as well as any other relevant characteristics of the typology of registered speakers , when known; in particular, concerning the age , the dialectal origin, whether they are native or non-native speakers, etc. In parallel, any geographical, physiological, psychological or sociological feature that would be common to the registered population members (or to a majority of them) should be identified and reported.
-
for speaker verification (and open-set identification), the origin of pseudo-impostors , i.e. whether they are chosen among the registered speakers or among an external pseudo-impostor population; in the latter case, the number of external pseudo-impostors , the proportion of male and female speakers, the population typology, the speech quantity per session (per pseudo-impostor) and the number of training sessions (per pseudo-impostor) should be reported.
-
for speaker verification (and open-set identification) , the origin of test impostors, i.e. whether they are chosen among the registered speakers (but claiming a false identity), among the pseudo-impostors , or among an external test impostor population. The last approach is by far the most realistic. However, when it is not feasible, an impostor utterance should never be tested against a registered speaker whose bundle of pseudo-impostors contains the test impostor.
-
when an external test impostor population is used, the number of external test impostors, the proportion of male and female speakers, the population typology, and in particular how their profile differs from the registered population and from the pseudo-impostor population should be described. The speech quantity per session (per test impostor) should be reported, as an indication of the statistical confidence of the evaluation results.
Speaker purpose and other human factors
-
the general purpose of the system, and whether the evaluation data are adequate for this purpose.
-
the intention of registered speakers , i.e. whether, in the test mode , they are cooperative speakers , uncooperative speakers , or if they behave as casual registered speakers .
-
the intention of impostors, i.e. whether they are well-intentioned impostors , casual impostors or intentional impostors . In the case of intentional impostors , the amount of knowledge they have on the true speaker should be specified, in particular whether they are acquainted by voice with the genuine speaker , and whether they are or not provided with the password, for text-dependent systems.
-
the impostor test configuration, i.e. what is the simulated (or real-life) strategy of an impostor in choosing to claim such and such an identity. For instance an exhaustive attempt, if each impostor tries each registered identity, or a selective attempt, if a certain criterion guides the impostor's choice, this criterion being stipulated. With laboratory recordings of casual speakers, we recommend the same-sex selective attempt configuration and the cross-sex selective attempt configuration, to which can be added the selective attempt towards the nearest registered speaker , especially for comparative evaluation of two systems on a same database.
-
the stakes of the system, i.e. what are the sources of motivation for registered speakers to be recognised (or not recognised), and those of an impostor to be accepted (or rejected).

Next: Example Up: Influencing factors Previous: Speaker purpose and other

EAGLES SWLG SoftEdition, May 1997. Get the book...