Role of statistical analysis and experimentation in Language Engineering Standards (LES)

Next: Statistical and experimental procedures Up: Introduction Previous: How to read this

Role of statistical analysis and experimentation in Language Engineering Standards (LES)

In talking about procedural considerations in language engineering, it will help to make things concrete. Let us assume that a client has commissioned development of a speech recognition system (System A) from scratch where expense is no object. It is to be employed in a European country where all inhabitants might want to use it. At the end of the day the client wants to have some idea about how its performance compares with another system on the market (System X). The company is given a free hand when developing the system and would prefer, for convenience purposes, to develop it on the basis of read speech though, as noted, it will eventually have to operate with spontaneous speech . The team assigned to the project decided to develop a system based on Artificial Neural Networks (ANNs) .

Some of the questions the team commissioned to do the work may decide to address are:

How to check whether there are differences between spontaneous and read speech , then make a decision whether the results with read speech apply to spontaneous speech .
If they find differences between read and spontaneous speech that require them to use the latter, how can they check whether language statistics on a sample of recordings they make to train and test the ANNs is representative of the language as a whole? Whether read or spontaneous speech is used, segments need labelling for training the networks and judges need to be brought in for this purpose.
What procedures are appropriate for the tasks of labelling and classifying the segments?
How can the accuracy of segment boundary placement and category classification by the judges be assessed?
How can improvement during development stages be monitored? This usually involves correct recall of training data by the ANNs . Here segmentation and classification differences between judges (see 2 and 3) might affect assessed recogniser performance. The preceding tests are vital to ensure that the training data is good and that changes in recogniser performance reflect improvements in the architecture, not artefacts of poor training data : An improvement in recogniser performance can be due to a genuine improvement that has been effected or a judge might have made errors and some change allows the system to make the same ones which would then appear to be correct (i.e., the two errors cancel themselves out). Without appropriate assessment of judges' performance, the latter can never be ruled out.
How are appropriate test data chosen?

These points highlight some of the statistical analysis and experimental procedures that need to feature in language engineering. Moreover, the specific questions raised, though pertaining to a particular issue of concern, are illustrative of many similar problems that language engineers encounter. Now we will set about attempting to provide answers to these (and other) questions.

The remainder of the chapter is organised in five main sections (9.2-9.6). These are (9.2) statistical and (9.3) experimental techniques to ensure that the corpora employed for training and testing are representative, (9.4) assessing speech recognition , (9.5) speaker verification and (9.6) dialogue systems. Sections 9.2 and 9.3 introduce an understanding of statistical analysis and experimentation, and should be read by anyone who does not have background in these subjects. The materials in sections 9.2, 9.3 and 9.4 are specifically focussed on the hypothetical scenario outlined above.

Next: Statistical and experimental procedures Up: Introduction Previous: How to read this

EAGLES SWLG SoftEdition, May 1997. Get the book...