Next: Refined language models
Up: Multilevel smoothing for trigram 
 Previous: Experimental results
 
Here, we give recommendations on how to use language
models for a specific application in speech recognition :
-  Today, by far the most successful method in language
      modelling is the bigram  and trigram approach.
      If there are enough training data , 
      a trigram model should be used; otherwise a bigram  model might
      be sufficient.
 -  Smoothing of the models is always required.  
      When smoothing a trigram model with a bigram  model,
      or a bigram model with a unigram  model,
      our experience is that the method
      of absolute discounting  is the method of choice:
      it is simple and robust with respect to the choice
      of the smoothing parameters.
      The backing-off  method introduced by [Katz (1987)]  produces
      comparable results at a somewhat higher effort.
 -  The use of improved backing-off  distributions like
      the singleton distribution plays only a minor role.
 -  In any application, it should be checked whether 
      the cache effect applies. Examples of such applications
      are text dictation  and maybe dialogue
      systems.  In these cases, the cache model
      should be combined with the baseline trigram model.
 -  When combining language models from ``different sources'',
      linear interpolation   is the method of choice.
      Only in rare cases will it be necessary to go through
      the trouble of performing a full training  with
      the EM algorithm . Even then it will be necessary in most
      cases to reduce the total number of independent 
      interpolation  parameters by tying.  
 
 
 
 
   
 
EAGLES SWLG SoftEdition, May 1997. Get the book...