Next: Example
Up: Influencing factors
 Previous: Speaker purpose and other 
 
The description of an evaluation experiment or an assessment protocol
concerning a speaker classification  or
recognition  system should explicitly report on the following items:
-  Speech quality  
- -
 - 
the general speech acquisition and transmission characteristics, in
particular the signal bandwidth , the nature of the noise ,
the signal-to-noise ratio (SNR)   when measurable,
the characteristics of the transmission line, ...
 - -
 - 
the speech quality  factors that remain constant
across training sessions   and test sessions,
  in particular concerning the
environment , the microphone , the
channel , ...
 
 -  Temporal drift 
- -
 - 
the way the speech material is split between training  and test material  in correlation with the chronological order of its recording; for each speaker, test material  should always be posterior to the latest training material .
 - -
 - 
the average number of training sessions   that are necessary to register one new speaker; in practice, the number of distinct convocations he has to answer.
 - -
 - 
the average registration timespan elapsed between the first training session  and the last training session , for one new speaker.
 - -
 - 
the average number of test sessions   per registered user,
taken into account in the evaluation. 
 - -
 - 
the average operation timespan elapsed between the first test
session  and the last test session, per
registered user, during the evaluation.
 
 -  Speech quantity and variety  
- -
 - 
the average speech quantity  per training session 
which is used per speaker for one training session, and if relevant, the
average percentage of effective training  speech quantity,
  i.e. the proportion of training speech which is actually used to build the registered speaker  models.
 - -
 - 
the average speech quantity  per test session 
which is used per speaker for one test session, and if relevant, the average
percentage of effective speech quantity  per test
  session, i.e. the proportion of test speech which is actually used to identify or verify the speaker, in test mode .
 - -
 - 
the qualitative description or characterisation of training  and test
  linguistic content.
 
 -  Speaker population size and typology
- -
 - 
for speaker identification , the registered
  speaker population size, i.e. the number of registered users; this figure can also be reported for speaker
verification experiments, but only as an indication of the statistical
validity of the results. 
 - -
 - 
the proportion of male and female registered speakers  as well as any
other relevant characteristics of the typology of registered speakers ,
when known; in particular, concerning the age , the dialectal  origin, whether
they are native or non-native speakers, etc. In parallel, any geographical,
physiological, psychological or sociological feature that would be 
  common to the registered population members (or to a majority of them)
should be identified and reported. 
 - -
 - 
for speaker verification  (and open-set
identification),
 
  the origin of 
  pseudo-impostors , i.e. whether they are chosen among the registered speakers 
or among an external pseudo-impostor population; in the latter case, the
number of external pseudo-impostors , the proportion of male and female
speakers, the population typology, the speech quantity  per session  (per
pseudo-impostor) and the number of training sessions   (per
pseudo-impostor)  should be reported. 
 - -
 - 
for speaker verification  (and open-set
identification) , the origin of
  test impostors, i.e. whether they are chosen among the registered
speakers 
(but claiming a false identity), among the pseudo-impostors , or among an
external test impostor population. The last approach is by far the most
realistic. However, when it is not feasible, an impostor utterance should
never be tested against a registered speaker  whose
bundle of pseudo-impostors 
contains the test impostor. 
 - -
 - 
when an external test impostor population is used, the number of external
  test impostors, the proportion of male and female speakers, the population
typology, and in particular how their profile differs from the registered
population and from the pseudo-impostor  population should be described. The
speech quantity  per session  (per test impostor) should be reported, as an
indication of the statistical confidence of the evaluation results.
 
 - 
Speaker purpose and other human factors
- -
 - 
the general purpose of the system, and whether the evaluation data are adequate for this purpose.
 - -
 - 
the intention of registered speakers , i.e. whether, in the test mode , they are cooperative speakers , uncooperative speakers , or if they behave as casual registered speakers .
 - -
 - 
the intention of impostors, i.e. whether they are well-intentioned
impostors , casual impostors  or intentional impostors . In the case of
intentional impostors , the amount of knowledge they have on the true
speaker should be specified, in particular whether they are acquainted by
  voice with the genuine speaker , and whether they are or not provided with the
password, for text-dependent systems.  
 - -
 - 
the impostor test configuration, i.e. what is the simulated (or
real-life) strategy of an impostor in choosing to claim such and such
an identity. For instance an exhaustive attempt, if each impostor tries
each registered identity, or a selective attempt, if a certain criterion
guides the impostor's choice, this criterion being stipulated. With laboratory 
recordings of casual speakers, we recommend the same-sex  selective attempt
configuration and the cross-sex  selective attempt configuration, to which can
be added the selective attempt towards the nearest registered speaker ,
especially for comparative evaluation of two systems on a same database. 
 - -
 - 
the stakes of the system, i.e. what are the sources of motivation for registered speakers  to be recognised (or not recognised), and those of an impostor to be accepted (or rejected).
 
 
 
 
 
 
 
 
 
 Next: Example
Up: Influencing factors
 Previous: Speaker purpose and other 
EAGLES SWLG SoftEdition, May 1997. Get the book...