Next: Example
Up: Influencing factors
Previous: Speaker purpose and other
The description of an evaluation experiment or an assessment protocol
concerning a speaker classification or
recognition system should explicitly report on the following items:
- Speech quality
- -
-
the general speech acquisition and transmission characteristics, in
particular the signal bandwidth , the nature of the noise ,
the signal-to-noise ratio (SNR) when measurable,
the characteristics of the transmission line, ...
- -
-
the speech quality factors that remain constant
across training sessions and test sessions,
in particular concerning the
environment , the microphone , the
channel , ...
- Temporal drift
- -
-
the way the speech material is split between training and test material in correlation with the chronological order of its recording; for each speaker, test material should always be posterior to the latest training material .
- -
-
the average number of training sessions that are necessary to register one new speaker; in practice, the number of distinct convocations he has to answer.
- -
-
the average registration timespan elapsed between the first training session and the last training session , for one new speaker.
- -
-
the average number of test sessions per registered user,
taken into account in the evaluation.
- -
-
the average operation timespan elapsed between the first test
session and the last test session, per
registered user, during the evaluation.
- Speech quantity and variety
- -
-
the average speech quantity per training session
which is used per speaker for one training session, and if relevant, the
average percentage of effective training speech quantity,
i.e. the proportion of training speech which is actually used to build the registered speaker models.
- -
-
the average speech quantity per test session
which is used per speaker for one test session, and if relevant, the average
percentage of effective speech quantity per test
session, i.e. the proportion of test speech which is actually used to identify or verify the speaker, in test mode .
- -
-
the qualitative description or characterisation of training and test
linguistic content.
- Speaker population size and typology
- -
-
for speaker identification , the registered
speaker population size, i.e. the number of registered users; this figure can also be reported for speaker
verification experiments, but only as an indication of the statistical
validity of the results.
- -
-
the proportion of male and female registered speakers as well as any
other relevant characteristics of the typology of registered speakers ,
when known; in particular, concerning the age , the dialectal origin, whether
they are native or non-native speakers, etc. In parallel, any geographical,
physiological, psychological or sociological feature that would be
common to the registered population members (or to a majority of them)
should be identified and reported.
- -
-
for speaker verification (and open-set
identification),
the origin of
pseudo-impostors , i.e. whether they are chosen among the registered speakers
or among an external pseudo-impostor population; in the latter case, the
number of external pseudo-impostors , the proportion of male and female
speakers, the population typology, the speech quantity per session (per
pseudo-impostor) and the number of training sessions (per
pseudo-impostor) should be reported.
- -
-
for speaker verification (and open-set
identification) , the origin of
test impostors, i.e. whether they are chosen among the registered
speakers
(but claiming a false identity), among the pseudo-impostors , or among an
external test impostor population. The last approach is by far the most
realistic. However, when it is not feasible, an impostor utterance should
never be tested against a registered speaker whose
bundle of pseudo-impostors
contains the test impostor.
- -
-
when an external test impostor population is used, the number of external
test impostors, the proportion of male and female speakers, the population
typology, and in particular how their profile differs from the registered
population and from the pseudo-impostor population should be described. The
speech quantity per session (per test impostor) should be reported, as an
indication of the statistical confidence of the evaluation results.
-
Speaker purpose and other human factors
- -
-
the general purpose of the system, and whether the evaluation data are adequate for this purpose.
- -
-
the intention of registered speakers , i.e. whether, in the test mode , they are cooperative speakers , uncooperative speakers , or if they behave as casual registered speakers .
- -
-
the intention of impostors, i.e. whether they are well-intentioned
impostors , casual impostors or intentional impostors . In the case of
intentional impostors , the amount of knowledge they have on the true
speaker should be specified, in particular whether they are acquainted by
voice with the genuine speaker , and whether they are or not provided with the
password, for text-dependent systems.
- -
-
the impostor test configuration, i.e. what is the simulated (or
real-life) strategy of an impostor in choosing to claim such and such
an identity. For instance an exhaustive attempt, if each impostor tries
each registered identity, or a selective attempt, if a certain criterion
guides the impostor's choice, this criterion being stipulated. With laboratory
recordings of casual speakers, we recommend the same-sex selective attempt
configuration and the cross-sex selective attempt configuration, to which can
be added the selective attempt towards the nearest registered speaker ,
especially for comparative evaluation of two systems on a same database.
- -
-
the stakes of the system, i.e. what are the sources of motivation for registered speakers to be recognised (or not recognised), and those of an impostor to be accepted (or rejected).
Next: Example
Up: Influencing factors
Previous: Speaker purpose and other
EAGLES SWLG SoftEdition, May 1997. Get the book...