Technical set-up

Next: Training Up: Experimental design of small Previous: Experimental design of small

Technical set-up

A typical set-up for testing a real-time word recogniser, is shown in Figure 10.1. The set-up consists of a playback device, the recognition system and a controlling device. Note that for the playback and controlling device, a single person (the experimenter) would be sufficient, but it is wise to try this to get a feel for the recognition process.

Figure 10.1: Simplest set-up for interactive testing of a word recognition system

RECOMMENDATION 5
Before you try to automise the set-up, experiment a little with the system to get a feeling for how it works.

Depending on the level of automation, you can choose for the playback device:

a microphone and amplifier,
an analog recording device,
a digital recording device,
a computer with mass storage and a sound interface (digital-to-analog converter).

In most cases, the last option is chosen, because of the reproducibility and potential for automation. It also allows for the control of the time of silence between words, added noise, etc. If the recognition system has digital input, the analog path can be avoided completely. The functions of controlling and speech generation can easily be performed by the same computer.

One has to take care, however, that for a connected word recognition system, no ``bursts'' of speech should be generated, because a connected word systems is continuously ``listening''. Performance will be influenced by the silences that occur when the digital-to-analog converters are not fed with data. Therefore, first the entire test signal must be computed, and a single playback of that signal should be used during the test.

The way the recogniser gives output depends very much on the system. Nowadays, word recognition systems are most often shipped as a piece of hardware for a Personal Computer . Proprietary software is often included that allows the user to train the recogniser and to set it up for an application. Unless the controlling computer is the same as the computer that has the recognition system installed, the easiest way to do the assessment automatically is to send the output over a communication line (e.g. an RS-232 port). Some ways for the recognition software to respond are:

Put recognised words in the keyboard buffer.
Directly insert the characters into the application (e.g. an editor).
Return a string on a library call.
Send the recognised word to a serial communication line .

For the first two possibilities, a clever way has to be devised to be able to send the recogniser's responses to the controlling computer. A simple approach for item 2 is to run a simple terminal emulator program which will automatically send all input to the communication port. The third possibility allows for integration of the recognition system with the controlling computer. The fourth possibility is easiest for a standard set-up with a separate controlling computer.

A stand-alone recogniser is often equipped with a serial communication line to receive commands and to output recognised words. Such systems will fit easily into the general set-up.

The basic assessment procedure in such a set-up is simple, after initialisation and training (see Section 10.5.2):

Choose a test word , according to the allowed syntax and other defined conditions.
Instruct the recogniser to ``listen''.
Instruct the playback device to play the test word.
Record the recognition result from the recogniser .

The ESPRIT project SAM has very carefully defined what is called the ``Sesam Workstation'', as the controlling and speech generating computer system. For this PC-based PC platform many tools have been written, including a recogniser assessment tool ``SAMPAC '' (see Appendix E. It was developed at TNO-TM (The Netherlands), LIMSI (France), the current developing lab is CSELT, Italy.

Next: Training Up: Experimental design of small Previous: Experimental design of small

EAGLES SWLG SoftEdition, May 1997. Get the book...