Next: Statistical and experimental procedures
Up: Introduction
Previous: How to read this
In talking about procedural considerations in language
engineering, it will help to make things concrete. Let us assume
that a client has commissioned development of a speech recognition
system (System A) from scratch where expense is no object.
It is to be employed in a European country where all inhabitants might
want to use it. At the end of the day the client wants to have some
idea about how its performance compares with another system on the
market (System X). The company is given a free hand when developing
the system and would prefer, for convenience purposes, to develop
it on the basis of read speech though, as noted, it will eventually
have to operate with spontaneous speech . The team assigned to the
project decided to develop a system based on Artificial Neural
Networks (ANNs) .
Some of the questions the team commissioned to do
the work may decide to address are:
- How to check whether there are differences between
spontaneous
and read speech , then make a decision whether the results with read
speech apply to spontaneous speech .
- If they find differences between read and
spontaneous speech that require them to use the latter, how can they check
whether language statistics on a sample of recordings they make to
train and test the ANNs
is representative of the language as a
whole? Whether read or spontaneous speech
is used, segments need
labelling for training the networks and judges need to be brought
in for this purpose.
- What procedures are appropriate for the
tasks of labelling and classifying the segments?
- How can the
accuracy of segment boundary placement and
category classification by the judges be assessed?
- How can improvement during
development stages be monitored? This usually involves correct
recall of training data by the ANNs .
Here segmentation and
classification differences between judges (see 2 and 3) might
affect assessed recogniser performance. The preceding tests are
vital to
ensure that the training data is good and that changes in
recogniser performance reflect improvements in the architecture,
not
artefacts of poor training data : An improvement in recogniser
performance can be due to a genuine improvement that has been
effected or a judge might have made errors and some change allows
the system to make the same ones which would then appear to be
correct (i.e.,
the two errors cancel themselves out). Without appropriate
assessment of judges' performance, the latter can never be ruled
out.
- How are appropriate test data chosen?
These points highlight some of the statistical analysis and
experimental procedures that need to feature in language
engineering. Moreover, the specific questions raised, though
pertaining to a particular issue of concern, are illustrative of
many similar problems that language engineers encounter. Now we
will set about attempting to provide answers to these (and other)
questions.
The remainder of the chapter is organised in five main
sections (9.2-9.6).
These are (9.2) statistical and (9.3)
experimental techniques to ensure that the corpora employed for
training and testing are representative, (9.4)
assessing speech recognition ,
(9.5) speaker verification
and (9.6) dialogue systems.
Sections 9.2 and 9.3 introduce an understanding
of statistical analysis and experimentation, and
should be read by anyone who does not have background in
these subjects. The materials in sections 9.2, 9.3 and 9.4 are specifically focussed on the
hypothetical scenario
outlined above.
Next: Statistical and experimental procedures
Up: Introduction
Previous: How to read this
EAGLES SWLG SoftEdition, May 1997. Get the book...