next up previous contents index
Next: Linear interpolation Up: Final note: the mathematics Previous: Final note: the mathematics

Linear discounting and backing-off

   

The model of linear discounting in conjunction with backing-off [Katz (1987), Jelinek (1991)] has the advantage that it results in relatively simple formulae. The model is:


equation9049

Here we have two types of parameters to be estimated:

The unknown parameters are estimated by maximum likelihood in combination with leaving-one-out . We obtain the log-likelihood function:


eqnarray10285

where tex2html_wrap_inline46303 denotes the probability distribution for leaving out the event (h,w) from the training data .

By doing some elementary manipulations, we can decompose the log-likelihood function into two parts, one of which depends only on tex2html_wrap_inline45695 and the other depends only on tex2html_wrap_inline45703:
eqnarray9079

The tex2html_wrap_inline45695 dependent part is:


eqnarray10310

Taking the partial derivatives with respect to tex2html_wrap_inline45695 and equating them to zero, we obtain the closed-form solution:
eqnarray9087
The same value is obtained when we compute the probability mass of unseen words in the training data  for a given history h:
eqnarray9092

To estimate the backing-off distribution tex2html_wrap_inline45703, we rearrange the sums:


eqnarray10327

where tex2html_wrap_inline45729 is the number of singletons (h,w) for a given history h, i.e. the number of words following h exactly once, and where tex2html_wrap_inline45733 is defined as:
eqnarray9121

Taking the derivative, we have:


eqnarray10352

where we have taken into account that there are only contributions from those histories h which appear in the sum over w'. We do not know a closed-form solution for tex2html_wrap_inline45703. By extending the sum over all histories h [Kneser & Ney (1995)], we obtain the approximation:
eqnarray9114
For convenience, we have chosen the normalisation tex2html_wrap_inline45735. This type of backing-off distribution will be referred to as singleton distribution.

   



next up previous contents index
Next: Linear interpolation Up: Final note: the mathematics Previous: Final note: the mathematics

EAGLES SWLG SoftEdition, May 1997. Get the book...