Next: Advice to the reader
Up: Introduction
Previous: Introduction
This chapter gives an overview of
language modelling in the context of
large vocabulary speech
recognition
and covers the following topics:
- Why do we need language modelling in a speech recognition
system ,
and what are the particular problems?
- Stochastic language models
are introduced to
capture the inherent redundancy of the
language subset relevant for the specific recognition task.
- The definition of so-called perplexity
or, to be more exact, corpus perplexity is introduced as
a quantitative measure for the language constraints
that depend on both the chosen language model and
the test corpus .
- The need for smoothing techniques in language modelling
is discussed. Smoothing will be based on the
so-called leaving-one-out technique
which can be viewed
as a special type of cross-validation.
- To illustrate the specific problems in language
modelling, we consider smoothing methods for the widely and
successfully used bigram and trigram language
models .
- For smoothing , three techniques are presented in detail:
linear discounting ,
linear interpolation and
absolute discounting .
- A detailed description of a full trigram model is given.
As an extension, the combination of a trigram model
with the so-called cache
model is considered.
Experimental results are discussed for the
Wall Street Journal (WSJ) corpus along with
practical issues in the implementation
of language models.
- We describe refinements over the standard trigram model
by using word classes that are automatically learned.
We also discuss grammar based language models.
In particular, so-called link grammars are
able to capture long range dependencies
as opposed to the short range dependencies modelled
by trigram models.
- In a recognition system , the language model is used
during the decision process, which is usually referred to
as search in speech recognition .
To illustrate the integration of the language model
into the recognition procedure, we study
two search techniques in detail, namely
the search for the single best sentence
and
the generation of word graphs or lattices.
- The advantage of the word graph
is that
the acoustic recognition can be decoupled from
postprocessing steps like applying the language model
or the dialogue module.
Here significant progress has been achieved so that now
reliable and yet compact word graphs (lattices ) can be generated.
- In a final note (Section 7.8), we describe the mathematical
details of the smoothing techniques, in particular
the EM algorithm .
The primary application we consider
is large vocabulary speech
recognition with
applications like
text dictation and automatic dialogue
systems.
Some of the techniques presented are maybe useful
in other applications, too, like systems
for voice commands and guided dialogues, where
a finite state network might be sufficient as
language model.
For most non-experts and maybe even the experts
in speech recognition , it still is a surprise
that the trigram language model performs as well as it
does.
In contrast,
grammar based language models
(i.e. models based on linguistic grammars)
are far from being competitive at the present time.
Therefore, the description focusses on the trigram model
and related issues such as the sparse data problem
and smoothing.
This chapter is only able to touch upon some
of the issues in language modelling.
For other overviews, see [Jelinek (1991), Jelinek et al. (1991a), Jelinek et al. (1992)].
For related topics such as the use of stochastic methods for
language acquisition
and language understanding, see [Gorin et al. (1991)]
and [Pieraccini et al. (1993)], respectively.
Next: Advice to the reader
Up: Introduction
Previous: Introduction
EAGLES SWLG SoftEdition, May 1997. Get the book...