next up previous contents index
Next: Language models and search Up: Refined language models Previous: Grammar based language models

Recommendations: Refined language models

Here, some recommendations are given for the use of the refined language models in specific recognition tasks:

  1. Experimental experience is that any type of the usual language model refinements is unlikely to reduce the perplexity  by more than 10% over a standard trigram  model (or bigram  model, if the amount of training data  is small). Therefore in all applications, it should be checked first whether a trigram  model in combination with a cache component does not already do the job. In a number of recognition tasks, the perplexity  improvements by the language model refinements are not worth the additional effort using today's algorithms.
  2. There might be some particular applications where the amount of training data  is really small. In these cases, it can be useful to base the language model on word classes  rather than the words themselves. These word classes can be classes defined either by an automatic clustering procedure or by linguistic prior knowledge, e.g. parts of speech (POS) .
  3. If it is suitable to combine two language models of different type, e.g. a word bigram  model and a class bigram model, the first choice should be to try a linear interpolation   of the two models.



EAGLES SWLG SoftEdition, May 1997. Get the book...