For many applications of speaker recognition techniques, a wrong decision has only material consequences. For forensic applications, much more serious aspects are involved.
No serious scientific protocol has been able so far to evidence the existence of a fixed, robust, non-modifiable, individual voice characteristic that could be extracted from a speech signal and indicate without doubt the speaker's identity. Therefore, we recommend substituting the term voice signature for the frequently used voice print: The former renders better the idea of variability and intentionality, while the latter is misleading as it suggests an analogy with finger prints.
Speech databases, specific to the forensic area, should be carefully designed, recorded and distributed. Typical forensic situations should be simulated, including cases of voice disguise. These databases would consist for instance of a large collection of pairs of voice samples, sometimes from the same speaker, sometimes from different speakers. One of the recordings could take place in a studio , while the other would be submitted to several kinds of environmental , channel , artificial and intentional distortions. It would be calibrated in such a way that the average auditor cannot do better than a random choice. Such databases would be extremely precious for several purposes. One would be to evaluate forensic speech experts and clarify if they can really do much better than an average auditor. It would also clearly evidence to the forensic professionals what the limitations of human and automatic techniques of speaker recognition for such applications are.