The basic idea is to subtract a constant
from all counts *r*>0 and thus, in particular,
to leave the high counts virtually intact.
The intuitive justification is that a particular event
that has been seen exactly *r* times
in the training data
is likely to occur *r*-1, *r* or *r*+1 times in a new
set of data. Therefore, we assume a model where the counts *r* are
modified by an additive offset. From the normalisation constraint,
it immediately follows that this must be a negative constant since
the unseen events require a non-zero probability.
Experimental results in [Ney & Essen (1993)] show that the resulting estimates
are close to estimates obtained from the Turing-Good formula after
suitable smoothing [Good (1953), Nadas (1985)].
We define the model for absolute discounting:

We do the same manipulations as for linear discounting , i.e. separating the singletons, ordering and carrying out the sums. For , we obtain the same equation as for linear discounting . For the dependent part, we obtain the following leaving-one-out log-likelihood function:

Taking the partial derivatives with respect to ,
we obtain the following equation after separating the term with
*r*=2:

For this equation, there is no closed-form solution.
However, there are upper and lower bounds.
As shown in the appendix, we have the upper bound:

and the lower bound:

EAGLES SWLG SoftEdition, May 1997. Get the book...