When assessing recogniser performance, precautions need to be taken to ensure the sample is simple random: Say a small group of speakers is available and the set includes a speaker who is markedly different from the rest. If the recognisers are trained on a subset of speakers and tested on the remainder, then when the atypical speaker is included in the training set , he will be excluded from the test set . Consequently performance would be reasonably good even though the model trained is not particularly good because it includes the atypical speaker. Conversely, when the atypical speaker is not included in the training set , the model will be good. However, when the model is tested it will produce poorer performance than previously due to the inclusion of the atypical subject. Thus, it is possible to have better baseline performance for a poor model than a good model. It should be clear that simply omitting this subject reduces the chance of the sample being simple random. The only rigorous way round this problem is to ensure that the training and test data contain sufficient numbers of speakers so as to minimise the effects of atypical speakers (Section 9.2).
The basic data for assessing recogniser performance have a similar structure to that obtained during segmentation and classification: For the recogniser, a set of time-varying parameters of the speech are obtained. Recognisers such as those based on ANNs take as input frames which are usually of fixed length and which are comparatively crudely quantised. Such a recogniser produces a classification of a frame or group of frames as output. At the points where the classifications change, a segment boundary has occurred. Thus, there are two basic measures that can be compared with human judgments about what the passage contains - the relative position of segment boundaries, and correspondence between the classifications of the segments. Scoring metrics have been developed which are intended to measure the latter but, once again, they involve problems which are potentially due to implicit segmentation errors.