Next: Refined language models
Up: Multilevel smoothing for trigram
Previous: Experimental results
Here, we give recommendations on how to use language
models for a specific application in speech recognition :
- Today, by far the most successful method in language
modelling is the bigram and trigram approach.
If there are enough training data ,
a trigram model should be used; otherwise a bigram model might
be sufficient.
- Smoothing of the models is always required.
When smoothing a trigram model with a bigram model,
or a bigram model with a unigram model,
our experience is that the method
of absolute discounting is the method of choice:
it is simple and robust with respect to the choice
of the smoothing parameters.
The backing-off method introduced by [Katz (1987)] produces
comparable results at a somewhat higher effort.
- The use of improved backing-off distributions like
the singleton distribution plays only a minor role.
- In any application, it should be checked whether
the cache effect applies. Examples of such applications
are text dictation and maybe dialogue
systems. In these cases, the cache model
should be combined with the baseline trigram model.
- When combining language models from ``different sources'',
linear interpolation is the method of choice.
Only in rare cases will it be necessary to go through
the trouble of performing a full training with
the EM algorithm . Even then it will be necessary in most
cases to reduce the total number of independent
interpolation parameters by tying.
EAGLES SWLG SoftEdition, May 1997. Get the book...