next up previous contents index
Next: Practical issues Up: Multilevel smoothing for trigram Previous: Problem formulation

The full trigram model


To illustrate all the details involved in smoothing a trigram model, we will write down the full set of equations that is needed to smooth a trigram language model. Any of the three smoothing methods can be used for this purpose. Here, we will consider only absolute discounting  in connection with interpolation . At the levels of bigrams  and unigrams , we have to apply the same operation. Without using the singleton distribution, we obtain the full trigram model:


The advantage of interpolation  in comparison with backing-off  is that the computationally costly renormalisation is avoided; the experiments show that the perplexities are virtually not affected. The smoothing at the level of unigrams  will be required only in exceptional cases when the size of the training corpus  is extremely small. In all other conditions, we will set: tex2html_wrap_inline45807.

Note that we have not used the singleton distribution. In this case, the normal bigram  model has to be replaced by the corresponding singleton counts, that is


When using the singleton distribution, we have to be careful in using the leaving-one-out  concept because the set of events now is not any more the full set of training data  but only the corresponding singleton subset. As to the choice of history dependent versus history independent discounting parameters, it is important to know that the experimental tests show that the history dependence does not pay off, at least for absolute discounting.  So it is sufficient to have only two (!) discounting parameters, namely one for the bigram  model and another one for the trigram model.  

EAGLES SWLG SoftEdition, May 1997. Get the book...