Absolute discounting and backing-off

Next: Conclusion Up: Language model smoothing: modelling Previous: Linear interpolation

Absolute discounting and backing-off

The basic idea is to subtract a constant from all counts r>0 and thus, in particular, to leave the high counts virtually intact. The intuitive justification is that a particular event that has been seen exactly r times in the training data is likely to occur r-1, r or r+1 times in a new set of data. Therefore, we assume a model where the counts r are modified by an additive offset. From the normalisation constraint, it immediately follows that this must be a negative constant since the unseen events require a non-zero probability. Experimental results in [Ney & Essen (1993)] show that the resulting estimates are close to estimates obtained from the Turing-Good formula after suitable smoothing [Good (1953), Nadas (1985)]. We define the model for absolute discounting:

eqnarray9157

We do the same manipulations as for linear discounting , i.e. separating the singletons, ordering and carrying out the sums. For , we obtain the same equation as for linear discounting . For the dependent part, we obtain the following leaving-one-out log-likelihood function:

Taking the partial derivatives with respect to , we obtain the following equation after separating the term with r=2:

For this equation, there is no closed-form solution. However, there are upper and lower bounds. As shown in the appendix, we have the upper bound:

and the lower bound:

EAGLES SWLG SoftEdition, May 1997. Get the book...