A typical set-up for testing a real-time word recogniser, is shown in Figure 10.1. The set-up consists of a playback device, the recognition system and a controlling device. Note that for the playback and controlling device, a single person (the experimenter) would be sufficient, but it is wise to try this to get a feel for the recognition process.
Figure 10.1: Simplest set-up for interactive testing of a word recognition system
RECOMMENDATION 5
Before you try to automise the set-up, experiment a little
with the system to get a feeling for how it works.
Depending on the level of automation, you can choose for the playback device:
In most cases, the last option is chosen, because of the reproducibility and potential for automation. It also allows for the control of the time of silence between words, added noise, etc. If the recognition system has digital input, the analog path can be avoided completely. The functions of controlling and speech generation can easily be performed by the same computer.
One has to take care, however, that for a connected word recognition system, no ``bursts'' of speech should be generated, because a connected word systems is continuously ``listening''. Performance will be influenced by the silences that occur when the digital-to-analog converters are not fed with data. Therefore, first the entire test signal must be computed, and a single playback of that signal should be used during the test.
The way the recogniser gives output depends very much on the system. Nowadays, word recognition systems are most often shipped as a piece of hardware for a Personal Computer . Proprietary software is often included that allows the user to train the recogniser and to set it up for an application. Unless the controlling computer is the same as the computer that has the recognition system installed, the easiest way to do the assessment automatically is to send the output over a communication line (e.g. an RS-232 port). Some ways for the recognition software to respond are:
For the first two possibilities, a clever way has to be devised to be able to send the recogniser's responses to the controlling computer. A simple approach for item 2 is to run a simple terminal emulator program which will automatically send all input to the communication port. The third possibility allows for integration of the recognition system with the controlling computer. The fourth possibility is easiest for a standard set-up with a separate controlling computer.
A stand-alone recogniser is often equipped with a serial communication line to receive commands and to output recognised words. Such systems will fit easily into the general set-up.
The basic assessment procedure in such a set-up is simple, after initialisation and training (see Section 10.5.2):
The ESPRIT project SAM has very carefully defined what is called the ``Sesam Workstation'', as the controlling and speech generating computer system. For this PC-based PC platform many tools have been written, including a recogniser assessment tool ``SAMPAC '' (see Appendix E. It was developed at TNO-TM (The Netherlands), LIMSI (France), the current developing lab is CSELT, Italy.