Classification of recognition systems

Next: Speech quality and conditions Up: Introduction Previous: Introduction

Classification of recognition systems

A number of parameters define the capability of a speech recognition\ system. In Table 10.1 these parameters are categorised. The classification made here is based upon the typical design considerations of a recognition system, which may be closely related to a specific application or task. In general, these parameters are one way or another fixed into the system. For each of the categories, the extremes of an easy and difficult task, from the recogniser's point of view, are given.

Parameter Easy task Difficult task

Vocabulary size small unlimited

Speech type isolated words continuous speech

Speaker dependency speaker dependent speaker independent

Grammar strict syntax natural language

Training method multiple training embedded training

Table 10.1: Classification of speech recognition systems

**Table 10.1:** Classification of speech recognition systems
Parameter	Easy task	Difficult task
Vocabulary size	small	unlimited
Speech type	isolated words	continuous speech
Speaker dependency	speaker dependent	speaker independent
Grammar	strict syntax	natural language
Training method	multiple training	embedded training

Vocabulary size

The vocabulary size is of importance to the recogniser and its performance. The vocabularyvocabulary is defined to be the set of words that the recogniser can select from, i.e. the words it can refer to. In cases where there are few choices the recognition is obviously easier than if the vocabulary is large. The adjectives ``small'', ``medium'' and ``large'' are applied to vocabulary sizes of the order of 100, 1000 and (over) 5000 words, respectively. A typical small vocabulary recogniser can recognise only ten digits, a typical large vocabulary recognition system 20000 words.

Speech type

There is a distinction between ``isolated words '', ``connected words'' and ``continuous speech ''. For isolated wordsisolated words , the beginning and the end of each word can be detected directly from the energy of the signal. This makes the job of word boundary detection (segmentation ) and often that of recognition a lot easier than if the words are connected or even continuous , as is the case for natural connected discourse. The difference in classification between ``connected words'' and ``continuous speechcontinuous speech '' is somewhat technical. A connected word recogniser uses words as recognition units, which can be trained in an isolated word mode. Continuous speech is generally connected to large vocabulary recognisers that use subword units such as phone s as recognition units, and can be trained with continuous speech .

Speaker dependency

The recognition task can be either speaker dependent , or speaker independent . Speaker independent recognition is more difficult, because the internal representation of the speech must somehow be global enough to cover all types of voices and all possible ways of pronouncing words, and yet specific enough to discriminate between the various words of the vocabulary.

For a speaker dependent system the training is usually carried out by the user, but for applications such as large vocabulary dictation systems this is too time consuming for an individual user. In such cases an intermediate technique known as speaker adaptation is used. Here, the system is bootstrapped with speaker-independent models, and then gradually adapts to the specific aspects of the user.

Grammar

In order to reduce the effective number of words to select from, recognition system s are often equipped with some knowledge of the language. This may vary from very strict syntax rules, in which the words that may follow one another are defined by certain rules, to probabilistic language models, in which the probability of the output sentence is taken into consideration, based on statistical knowledge of the language. An objective measure of the ``freedom'' of the grammargrammar is the perplexity , which measures the average branching factor of the grammar . The higher the perplexityperplexity , the more words to choose from at each instant, and hence the more difficult the task. See Chapter 7 for a detailed discussion on language model ling.

An example of a very simple grammar is the following sentence-generating syntax:

equation13814

which can generate only six different sentences, which vary in the number of words.

For an example of statistical knowledge, consider the word million being recognised. If the domain is financial jargon, one can make a prediction of the next word, based on the following excerpt of conditional probabilities:

million acres 0.00139

million boxes 0.00023

million canadian 0.00846

million dollar 0.0935

million dollars 0.642

million left 0.0000081

There are almost two out of three chances that the word following million will be dollars (at least, within the domaindomain of the Wall Street Journal (WSJ ). These numbers were calculated from 37 million words of texts of a financial newspaper (the WSJ).

Training

The way an automatic speech recognition system is trained can vary. If each word of the vocabulary is trained many times, the system has an opportunity to build robust models of the words , and hence a good performance should be expected. Some systems can be trained with only one example of each word, or even none (if the models are pre-built). The number of times each word is trained is called the number of training passes .

Another trainingtraining issue that defines the capability of a system is whether or not it can deal with embedded training. In embedded training the systems is trained with strings of words (utterances) of which the starting and ending points are not specified explicitly. A typical example is a large vocabulary continuous speech recognition system that is trained with whole sentences, of which only the orthographic transcriptions are available.

Next: Speech quality and conditions Up: Introduction Previous: Introduction

EAGLES SWLG SoftEdition, May 1997. Get the book...

	million acres	0.00139
	million boxes	0.00023
	million canadian	0.00846
	million dollar	0.0935
	million dollars	0.642
	million left	0.0000081