next up previous contents index
Next: Training Up: Experimental design of small Previous: Experimental design of small

Technical set-up

A typical set-up for testing  a real-time  word recogniser, is shown in Figure 10.1. The set-up consists of a playback  device, the recognition system and a controlling device. Note that for the playback  and controlling device, a single person (the experimenter) would be sufficient, but it is wise to try this to get a feel for the recognition process.

Figure 10.1: Simplest set-up for interactive testing  of a word recognition system 

Before you try to automise the set-up, experiment a little with the system to get a feeling for how it works.

Depending on the level of automation, you can choose for the playback  device:

In most cases, the last option is chosen, because of the reproducibility  and potential for automation. It also allows for the control of the time of silence between words, added noise,   etc. If the recognition system has digital input,  the analog path can be avoided completely. The functions of controlling and speech generation can easily be performed by the same computer.

One has to take care, however, that for a connected word  recognition system, no ``bursts'' of speech should be generated, because a connected word  systems is continuously ``listening''. Performance will be influenced by the silences that occur when the digital-to-analog converters are not fed with data. Therefore, first the entire test signal must be computed, and a single playback  of that signal should be used during the test.

The way the recogniser gives output depends very much on the system. Nowadays, word recognition systems are most often shipped as a piece of hardware for a Personal Computer . Proprietary software is often included that allows the user to train  the recogniser and to set it up for an application. Unless the controlling computer is the same as the computer that has the recognition system installed, the easiest way to do the assessment automatically is to send the output over a communication line  (e.g. an RS-232 port). Some ways for the recognition software to respond are:

  1. Put recognised words in the keyboard buffer.
  2. Directly insert the characters into the application (e.g. an editor).  
  3. Return a string on a library call.
  4. Send the recognised word to a serial communication line .

For the first two possibilities, a clever way has to be devised to be able to send the recogniser's  responses to the controlling computer. A simple approach for item 2 is to run a simple terminal emulator  program which will automatically send all input to the communication port. The third possibility allows for integration of the recognition system with the controlling computer. The fourth possibility is easiest for a standard set-up with a separate controlling computer.

A stand-alone recogniser  is often equipped with a serial communication line  to receive commands and to output recognised words. Such systems will fit easily into the general set-up.

The basic assessment procedure in such a set-up is simple, after initialisation and training  (see Section 10.5.2):

  1. Choose a test word , according to the allowed syntax  and other defined conditions.
  2. Instruct the recogniser  to ``listen''.
  3. Instruct the playback  device to play the test word.
  4. Record the recognition result from the recogniser .

The ESPRIT  project SAM  has very carefully defined what is called the ``Sesam Workstation'', as the controlling and speech generating computer system. For this PC-based PC  platform many tools have been written, including a recogniser  assessment tool ``SAMPAC '' (see Appendix E. It was developed at TNO-TM (The Netherlands), LIMSI (France), the current developing lab is CSELT, Italy.

next up previous contents index
Next: Training Up: Experimental design of small Previous: Experimental design of small

EAGLES SWLG SoftEdition, May 1997. Get the book...