A representative database consists of data collected ``in the field .'' It is a collection of speech material that is very specific to the application: the same speakers in representative recording conditions, and similar words for all conditions the application is being used for. This is the ultimate test for evaluating the recognition system to the very specific conditions and requirements of the application.
The procedure may
lead to a decision whether or not to actually use the system.
However, often this is not a good starting point for assessment. This
is because many recognition systems have to be trained or parameters
have to be tuned so as to optimise recognition performance.
Disadvantages are that during field experiments many
parameters are uncontrolled, or simply things go wrong and cannot be
repeated. Also, recording a representative database is very time consuming and
expensive.
One of the most important principles of recogniser assessment is that
one cannot use a test database more than once, for a particular
system, because as soon as there is feedback from the
assessment results to the training state of the system, the system is
actually trained to perform well for that specific test, is
because the assessment test is often only a small sample of the
material that the system is actually going to be used for in the
application. This sample can only be representative of the application
if it ``has never been seen''.
RECOMMENDATION 2
Be somewhat reserved towards directly using all the available
representative test material for evaluation assessment. You may want to run
some other tests before you do this.
The recognition score gives you an idea on how well the system will perform for the selected application. It is important that the test utterances are a representative sample of the application. Conditions to which the sample should conform, are:
Most of these points apply only to the assessment of large vocabulary systems.