The motivation for which a speaker is using a system also influences considerably its performance profile. We first describe a possible typology of applicant speakers as regards their objectives. Then we mention other relevant human factors.
When the user's goal is conformant with the purpose of the system, a cooperative (registered) speaker can be defined as an authorised applicant who is willing to be identified or as a genuine speaker who intends to be verified positively. Their counterpart in the impostor population would be a well-intentioned impostor , i.e. an impostor having the goal of being rejected.
When the user's goal and the system's purpose are inverse, an uncooperative (registered) speaker knows that he is being verified but wants the system to reject him. For instance, an uncooperative speaker is likely to use natural or artificial voice masking in order to remain anonymous. In contrast, an intentional impostor has the clear goal of being identified or verified though he is not registered (violation ), or to be identified as somebody else (usurpation ).
Here, a distinction must be made among intentional impostors depending on whether they previously have or have not been in contact with the voice of the authentic user whose identity they are claiming. We propose the term acquainted impostor to qualify an intentional impostor who has some knowledge of the voice of the authorised speaker, as opposedto unacquainted impostors , when the impostor has never been in contact with the genuine or authentic user. The degree of success of an acquainted intentional impostor will ultimately depend on his imitation skills.
The term casual impostor is often used to qualify speakers who are used as impostors in an evaluation, but who were not recorded with the explicit instruction to try to defeat the system. In the same way, the term casual registered speakers can be used to refer to a population of registered speakers who have not received an explicit instruction to succeed in being identified or verified positively.
Here again, variants appear, depending on the way the experimenter chooses the claimed identity of a casual impostor in a verification experiment. A casual impostor can be tested against all registered users systematically, against all other registered speakers of the same sex , against all other registered speakers of the opposite sex, against k registered speaker chosen at random, against the k nearest neighbours in the registered population, etc.
Whereas, in a first approximation, a population of casual registered speakers may be relatively representative of a population of cooperative registered speakers , no test protocol using casual impostors can accurately approximate the behaviour of intentional impostors . In practice, a real impostor could try to vary his voice characteristics for a fixed identity along successive trials, until he succeeds in defeating the system, gives up, or until the system blacklists the genuine user. Or he may try as many registered identities as he can with his natural voice or a disguised voice, until he succeeds, gives up, or until the police arrives!
However, most laboratory evaluations use speech databases which have usually not been recorded in a real-world situation. Therefore they do not model accurately either cooperativeness or intentional imposture, and the impostor speakers are casual impostors . A frequent practice is to use an exhaustive attempt test configuration, for which each impostor is successively tested against each registered speaker . We suggest adopting a slightly different approach. Two distinct experiments should in fact be carried out: one for which each casual impostor utterance is tested against all registered identities of the same sex , and a second one for which each casual impostor utterance is tested against all registered identities of the opposite sex . The first experiment permits estimation of the rejection ability of a system towards unacquainted intentional impostors who would know the sex of the genuine speaker , even though casual impostors are almost well-intentioned impostors . The second experiment tests whether the system is really robust to cross-sex imposture. We will refer to these configurations as a selective attempt against all same-sex speakers and selective attempt against all cross-sex speakers respectively. In a first approximation, the proportion of successful violations does not depend on the number of registered speakers .
In addition, testing each impostor utterance against its nearest neighbour in the registered population can give an indication of the system's robustness against intentional imposture . However, the result will be directly influenced by the registered speaker population size. Therefore this approach is only meaningful in the framework of a comparative evaluation on a common database. This approach can be qualified as a selective attempt towards the nearest registered neighbour. Other selective attempts are possible, such as towards speakers of the same age class, for instance.
To summarise, registered speakers should be qualified as cooperative , casual or uncooperative , whereas a distinction should be made between well-intentioned , casual or (acquainted/unacquainted) intentional impostors . Only field data can provide realistic instances of user behaviour.
Additionally, the general motivation and behaviour of the users can have an impact on the performance of a system: for instance, what are the stakes of a successful identification or verification, the benefits of an imposture, the feeling of users towards voice technology in general, etc. In evaluation, all these aspects influence the motivation of the user, and therefore the interpretation of the results.