An open-set identification system can be viewed as a function which assigns to any test utterance z an estimated speaker index , corresponding to the identified speaker in the set of registered speakers , or outputs 0 if the applicant speaker is considered as an impostor.
In open-set identification, three types of error can be distinguished:
Here, two points of view can be adopted.
Either a misclassification error is considered as a false acceptance (while a correct identification is treated as a true acceptance ). In this case, open-set identification can be scored in the same way as verification, namely by evaluating a false rejection rate and a false acceptance rate . The concept of ROC curve can be extended to this family of systems, and in particular, an equal error rate can be computed. However, the false acceptance rate is now bounded by a value when the threshold tends to 0, being the closed-set misclassification rate of the system, i.e. the performance that the open-set identification system would provide if it was functioning in a closed-set mode. Therefore, a parametric approach for dynamic evaluation would require a specific class of ROC curve models (at least with two parameters). Moreover, merging classification errors with false acceptances may not be appropriate if the two types of error are not equally harmful.
An alternative solution is to keep distinct the three types of error, and measure them by three rates , and . The ROC curve is now a curve in a three-dimensional space, with equation . The two extremities of this curve are the points with coordinates and . The ROC curve can be projected as and . The first projection is a monotonically decreasing curve such as and , whereas the second projection is also monotonically decreasing, and satisfies and . A minimal description of the curve of could then be the equal error rate of function f and the closed-set identification score of function g. Parametric models of with two degrees of freedom could be thought of, but to our knowledge, this remains an unexplored research topic.
Among both possibilities, we believe that the second one is to be preferred, though it is slightly more complex.