For continuous speech it is not so trivial how the assignment of ``deletion s'', ``substitution s'' and ``insertion s'' should be made. The process by which this is carried out is called alignment . If the recogniser's segmentation is available, i.e. if the times of the starting and ending of each recogniser word are available, this alignmentalignment can be done in a way comparable to the isolated word recogniser assessment.
Generally, such labelling information is not available in the recognition output. In this case, the alignment process uses a dynamic programming algorithm to minimise the misalignment of two strings of words (symbols), the reference sentence and the recognised sentence. The alignment depends on the relative weights of the contributions of substitutionsubstitution s , insertions and deletions. [Hunt (1990)] discusses the theory of word-symbol alignment and analyses some experiments on alignment.
NIST has developed freely available software for analysis of continuous speech recognition systems. It basically consists of two parts: an alignment program and a statistics package.
The alignment can be performed both on word level and on the phone level (so-called phonetic alignment ) if the dictionarydictionary is available. It is a standard alignment procedure and is therefore recommended for competitive assessment.
The software was developed for the ARPA evaluations, but it has been designed to make the programs generally applicable. The alignment program generates a (binary) file with all alignment information, which can be printed by another utility in various levels of detail. Overall results can be compiled, as well as results on a per-speaker level. The statistics program can pairwise compare the results of different recognition systems and decide whether or not the difference in performance is significant. This is done using four different statistical tests.