Recommendations: m-gram language models

Next: Refined language models Up: Multilevel smoothing for trigram Previous: Experimental results

Here, we give recommendations on how to use language models for a specific application in speech recognition :

Today, by far the most successful method in language modelling is the bigram and trigram approach. If there are enough training data , a trigram model should be used; otherwise a bigram model might be sufficient.
Smoothing of the models is always required. When smoothing a trigram model with a bigram model, or a bigram model with a unigram model, our experience is that the method of absolute discounting is the method of choice: it is simple and robust with respect to the choice of the smoothing parameters. The backing-off method introduced by [Katz (1987)] produces comparable results at a somewhat higher effort.
The use of improved backing-off distributions like the singleton distribution plays only a minor role.
In any application, it should be checked whether the cache effect applies. Examples of such applications are text dictation and maybe dialogue systems. In these cases, the cache model should be combined with the baseline trigram model.
When combining language models from ``different sources'', linear interpolation is the method of choice. Only in rare cases will it be necessary to go through the trouble of performing a full training with the EM algorithm . Even then it will be necessary in most cases to reduce the total number of independent interpolation parameters by tying.

EAGLES SWLG SoftEdition, May 1997. Get the book...