Goals of the chapter

Next: Advice to the reader Up: Introduction Previous: Introduction

Goals of the chapter

This chapter gives an overview of language modelling in the context of large vocabulary speech recognition and covers the following topics:

Why do we need language modelling in a speech recognition system , and what are the particular problems?
Stochastic language models are introduced to capture the inherent redundancy of the language subset relevant for the specific recognition task.
The definition of so-called perplexity or, to be more exact, corpus perplexity is introduced as a quantitative measure for the language constraints that depend on both the chosen language model and the test corpus .
The need for smoothing techniques in language modelling is discussed. Smoothing will be based on the so-called leaving-one-out technique which can be viewed as a special type of cross-validation.
To illustrate the specific problems in language modelling, we consider smoothing methods for the widely and successfully used bigram and trigram language models .
For smoothing , three techniques are presented in detail: linear discounting , linear interpolation and absolute discounting .
A detailed description of a full trigram model is given. As an extension, the combination of a trigram model with the so-called cache model is considered. Experimental results are discussed for the Wall Street Journal (WSJ) corpus along with practical issues in the implementation of language models.
We describe refinements over the standard trigram model by using word classes that are automatically learned. We also discuss grammar based language models. In particular, so-called link grammars are able to capture long range dependencies as opposed to the short range dependencies modelled by trigram models.
In a recognition system , the language model is used during the decision process, which is usually referred to as search in speech recognition . To illustrate the integration of the language model into the recognition procedure, we study two search techniques in detail, namely the search for the single best sentence and the generation of word graphs or lattices.
The advantage of the word graph is that the acoustic recognition can be decoupled from postprocessing steps like applying the language model or the dialogue module. Here significant progress has been achieved so that now reliable and yet compact word graphs (lattices ) can be generated.
In a final note (Section 7.8), we describe the mathematical details of the smoothing techniques, in particular the EM algorithm .

The primary application we consider is large vocabulary speech recognition with applications like text dictation and automatic dialogue systems. Some of the techniques presented are maybe useful in other applications, too, like systems for voice commands and guided dialogues, where a finite state network might be sufficient as language model. For most non-experts and maybe even the experts in speech recognition , it still is a surprise that the trigram language model performs as well as it does. In contrast, grammar based language models (i.e. models based on linguistic grammars) are far from being competitive at the present time. Therefore, the description focusses on the trigram model and related issues such as the sparse data problem and smoothing. This chapter is only able to touch upon some of the issues in language modelling. For other overviews, see [Jelinek (1991), Jelinek et al. (1991a), Jelinek et al. (1992)]. For related topics such as the use of stochastic methods for language acquisition and language understanding, see [Gorin et al. (1991)] and [Pieraccini et al. (1993)], respectively.

Next: Advice to the reader Up: Introduction Previous: Introduction

EAGLES SWLG SoftEdition, May 1997. Get the book...