Speaker recognition systems are sensitive to external factors, such as noise , transducer and channel characteristics, etc. Performance is also influenced by intra-speaker variability, i.e. all kinds of unintentional or intentional modifications of the speaker's voice. Among the most common unintentional modifications are changes in health conditions and perturbations in speech production caused by the environment , such as surrounding noise. For what concerns intentional voice modifications, one can distinguish between voice masking and mimicry, depending on the goal of the speaker.
It is virtually impossible to construct a database which would be representative of all combinations of these factors varying in sufficiently small steps to cover any imaginable situation! The idea behind indirect assessment [SAM-A (1993)] is twofold: first to develop realistic models of these factors, and then to apply these models to pre-existing databases and vary them in a controlled manner. For each factor, the limits of acceptable variations can be measured, providing a sensitivity profile for the system under test.
At our stage of knowledge, it is certainly not possible to develop a viable model for each factor mentioned above. In fact, external factors appear to be easier to simulate than intra-speaker variability. But we believe that indirect assessment through transformations of speech databases is certainly a challenging research topic, the outcomes of which should contribute to simplify assessment procedures considerably.