next up previous contents index
Next: Refined language models Up: Multilevel smoothing for trigram Previous: Experimental results

Recommendations: m-gram language models


Here, we give recommendations on how to use language models for a specific application in speech recognition :

  1. Today, by far the most successful method in language modelling is the bigram  and trigram approach. If there are enough training data , a trigram model should be used; otherwise a bigram  model might be sufficient.
  2. Smoothing of the models is always required. When smoothing a trigram model with a bigram  model, or a bigram model with a unigram  model, our experience is that the method of absolute discounting  is the method of choice: it is simple and robust with respect to the choice of the smoothing parameters. The backing-off  method introduced by [Katz (1987)] produces comparable results at a somewhat higher effort.
  3. The use of improved backing-off  distributions like the singleton distribution plays only a minor role.
  4. In any application, it should be checked whether the cache effect applies. Examples of such applications are text dictation  and maybe dialogue systems.  In these cases, the cache model should be combined with the baseline trigram model.
  5. When combining language models from ``different sources'', linear interpolation   is the method of choice. Only in rare cases will it be necessary to go through the trouble of performing a full training  with the EM algorithm . Even then it will be necessary in most cases to reduce the total number of independent interpolation  parameters by tying. 

EAGLES SWLG SoftEdition, May 1997. Get the book...