Representative databases

Next: Reference methods Up: Description of methodologies Previous: Description of methodologies

Representative databases

A representative database consists of data collected ``in the field .'' It is a collection of speech material that is very specific to the application: the same speakers in representative recording conditions, and similar words for all conditions the application is being used for. This is the ultimate test for evaluating the recognition system to the very specific conditions and requirements of the application.

The procedure may lead to a decision whether or not to actually use the system. However, often this is not a good starting point for assessment. This is because many recognition systems have to be trained or parameters have to be tuned so as to optimise recognition performance. Disadvantages are that during field experiments many parameters are uncontrolled, or simply things go wrong and cannot be repeated. Also, recording a representative database is very time consuming and expensive. One of the most important principles of recogniser assessment is that one cannot use a test database more than once, for a particular system, because as soon as there is feedback from the assessment results to the training state of the system, the system is actually trained to perform well for that specific test, is because the assessment test is often only a small sample of the material that the system is actually going to be used for in the application. This sample can only be representative of the application if it ``has never been seen''.

RECOMMENDATION 2
Be somewhat reserved towards directly using all the available representative test material for evaluation assessment. You may want to run some other tests before you do this.

The recognition score gives you an idea on how well the system will perform for the selected application. It is important that the test utterances are a representative sample of the application. Conditions to which the sample should conform, are:

The vocabulary should be representative of the application; frequency of words should occur similarly.
Other language technological parameters, such as the fraction of out-of-vocabulary words , perplexity of the sentences/sentence grammar , should match the use in the application.
The speakers should be representative, i.e. the gender ratio , age , speaking rate , dialect etc. should reflect the end users of the application.

Most of these points apply only to the assessment of large vocabulary systems.

Next: Reference methods Up: Description of methodologies Previous: Description of methodologies

EAGLES SWLG SoftEdition, May 1997. Get the book...