Every approach to automatic speech recognition is faced with the problem of taking decisions in the presence of ambiguity and context, and of modelling the interdependence of these decisions at various levels. If it were possible to recognise phonemes (or words) with a very high reliability, it would not be necessary to rely heavily on delayed decision techniques, error correcting techniques and statistical methods. In the near future, this problem of reliable and virtually error free phoneme or word recognition without using high-level knowledge is unlikely to be solved for large-vocabulary continuous-speech recognition. As a consequence, the recognition system has to deal with a large number of hypotheses about phonemes , words and sentences, and ideally has to take into account the ``high-level constraints'' as given by syntax , semantics and pragmatics . Given this state of affairs, statistical decision theory tells us how to minimise the probability of recognition errors [Bahl et al. (1983)].
The word sequence  
  to be recognised from the sequence of
acoustic observations 
  is  determined as that word sequence
 for which the posterior probability
 attains its maximum. 
The sequence of acoustic vectors
 over time t=1...T
is derived from the speech signal in the
preprocessing step of acoustic analysis.
Statistical decision theory leads to the so-called
Bayes decision rule, which can be written in the form:
![]()
where  
  is the conditional probability,
given the word  sequence
,
of observing the sequence of acoustic vectors 
 and
where  
  is the prior probability of producing the word
sequence  
.
The application of the Bayes decision rule 
to the speech recognition problem is illustrated in
Figure 7.1.
 
Figure 7.1: Bayes decision rule for speech recognition 
The decision rule requires two types of probability distribution, which we refer to as stochastic knowledge sources, along with a search strategy: