Types of language models

Next: Why does the trigram Up: System architecture for speech Previous: Stochastic language modelling

To illustrate the broad range of language model types, we mention some typical examples:

no or uniform language model: Here, the idea is to use the same probability for all events; events can be either the words of the vocabulary or the sentences, if the number of sentences is limited. If all words are equiprobable, there is an implied model for the duration of a sentence: a sentence of N words then has a probability .
finite state language model : The set of legal word sequences is represented as a finite state network (or regular grammar ) whose edges stand for the spoken words, i.e. each path through the network results in a legal word sequence. To make this approach correct from a probabilistic point of view, the edges have to be assigned probabilities.
m-gram language models: In m-gram language models, all word sequences are possible, and the probability of the word predicted depends only on the (m-1) immediate predecessor words (see above).
grammar based language models: Typically, these models are based on variants of stochastic context free grammars or other phrase structure grammars .
other types: There are language models that make use of still other concepts like CART (classification and regression trees) [Breiman et al. (1984), Bahl et al. (1989)] and maximum entropy [Lau et al. (1993), Rosenfeld (1994)].

It should be noted that this classification of language models is not exhaustive, and a specific language model may belong to several types.

EAGLES SWLG SoftEdition, May 1997. Get the book...