Transformation of speech databases


Speaker recognition  systems are sensitive to external factors, such as noise , transducer and channel  characteristics, etc. Performance is also influenced by intra-speaker variability, i.e. all kinds of unintentional   or intentional  modifications of the speaker's voice. Among the most common unintentional modifications are changes in health  conditionsgif and perturbations in speech production caused by the environment , such as surrounding noise. gif For what concerns intentional voice modifications, one can distinguish between voice masking   and mimicry, depending on the goal of the speaker.gif

It is virtually impossible to construct a database which would be representative of all combinations of these factors varying in sufficiently small steps to cover any imaginable situation! The idea behind indirect assessment [SAM-A (1993)] is twofold: first to develop realistic models of these factors, and then to apply these models to pre-existing databases and vary them in a controlled manner. For each factor, the limits of acceptable variations can be measured, providing a sensitivity profile   for the system under test.

At our stage of knowledge, it is certainly not possible to develop a viable model for each factor mentioned above. In fact, external factors appear to be easier to simulate than intra-speaker variability. But we believe that indirect assessment through transformations of speech databases is certainly a challenging research topic, the outcomes of which should contribute to simplify assessment procedures considerably.

