In linear interpolation, a weighted average between the relative frequencies and the general distribution is computed:

In leaving-one-out , we have to renormalise by the following substitution:

The leaving-one-out likelihood function is:

Taking the partial derivatives and doing some term rearrangements
using *q*(*w*|*h*)=0 for *N*(*h*,*w*)=1,
we obtain the iteration formulae [Ney et al. (1994)]:

Similarly, we have for the :

Again, the dominating effect of the singletons is evident.
For the special choice if *N*(*h*,*w*)>0, we have
linear discounting again and recover the corresponding equations.

The model of linear interpolation considered so far is special
because it involves only two types of histories, namely *h* and .
In general, we may have several generalised histories
for *i*=1,2,....
For this general case of linear interpolation, the so-called EM algorithm provides
an efficient iterative procedure for estimating the unknown parameters
[Baum (1972), Dempster et al. (1977), Jelinek & Mercer (1980)]. To consider the details of the EM
algorithm , we have to specify the full interpolation model:

For the interpolation to work, the interpolation parameter
must explicitly depend on the index *i* of the generalised history.
The framework of the EM algorithm is based on the so-called
function, where is the new estimate obtained from
the previous estimate [Baum (1972), Dempster et al. (1977)].
The parameter stands for the whole
set of parameters to be estimated.
The function is an extension of
the usual log-likelihood function and is for our model:

Taking the partial derivatives with respect to
and observing the normalisation constraint,
we obtain the equation:

In this form, the interpolation parameters
depend on the history *h* the number of which can be large.
Therefore, some sort of tying might often be useful [Jelinek & Mercer (1980)].

Taking the partial derivatives with respect to
and observing the normalisation constraint,
we obtain the equation: