We will now concentrate on the assessment procedure for speaker independent, large vocabulary, continuous speech recognition systems. At the time of writing, these systems still are laboratory systems, but this may change in the near future.
Unlike the simple isolated word recogniser , these systems generally work off-line, i.e. a whole utterance is fed to the recogniser, and the result may take a while before it is produced. Often, these systems are assessed in a completely asynchronous way: first all utterances supplied to the recognition system, and later all results are submitted to the scoring program. However, research is also taking place on incremental on-line systems.
The institute with most experience in the assessment of large vocabulary continuous speech recognition systems is the (Defense) Advanced Research Projects Agency (ARPA /DARPA ) in the USA. In 1987 this organisation started to organise benchmarking evaluation tests for continuous speech recognition laboratories, which were coordinated and evaluated by NIST (National Institute of Standards) . This yearly test has been a great stimulus for the competing laboratories and has proved to be a positive impulse for developing better recognition systems. In the meantime, better training databases have become available, which has also had a positive influence on the results.
This section is heavily based on the ARPA benchmark paradigm, although some experience of the ongoing project SQALE has also been used. Therefore the underlying purpose of assessment is benchmarking. In the ARPA paradigm, systems of various laboratories are evaluated competitively. Before the actual assessment test, training material is defined and distributed (including development test material ), and a dry run test is performed. After the assessment test, a closed workshop is organised in which the results and benefits of the techniques of the various systems are discussed.