Next: Search: Finding the single
Up: Language models and search
Previous: Language models and search
Looking at the basic architecture shown in Figure 7.1
we see that there are
different types of reason why a speech
recognition system , in particular
a large-vocabulary
continuous-speech system,
can make a recognition error:
- acoustic-phonetic modelling: This part of
the system includes all parts related to the acoustic
signal:
- signal analysis;
- phoneme modelling:
- the inventory of
context independent and context dependent
phoneme units;
-
in most cases, the phoneme units
are represented by Hidden Markov models
[Levinson et al. (1983), Bahl et al. (1983)];
any of their details such as topology and
emission probabilities
may have an effect on the error rate ;
- pronunciation lexicon: the pronunciation lexicon
serves as the link between the word level and
the phoneme units.
It is obvious that any of these three levels of acoustic-phonetic
modelling can cause recognition errors. For example,
a word whose entry in the pronunciation lexicon is incorrect
is unlikely to be recognised correctly.
- language modelling: If the language model is poor
it cannot help much to resolve
the ambiguities in acoustic recognition.
- search errors:
A full, i.e. globally optimal,
search is prohibitive for large vocabulary
speech recognition . Therefore global optimal search
is abandoned and replaced by a suboptimal search.
Not finding the globally optimal word sequence may
cause additional recognition errors.
These search errors
will disappear if the search effort is increased to
evaluate more hypotheses about the spoken word sequences.
EAGLES SWLG SoftEdition, May 1997. Get the book...