next up previous contents index
Next: Absolute discounting and backing-off Up: Final note: the mathematics Previous: Linear discounting and backing-off

Linear interpolation

   

In linear interpolation, a weighted average between the relative frequencies tex2html_wrap_inline45737 and the general distribution tex2html_wrap_inline45703 is computed:


eqnarray10387

In leaving-one-out , we have to renormalise by the following substitution:


eqnarray10395

The leaving-one-out  likelihood function is:


eqnarray10402

Taking the partial derivatives and doing some term rearrangements using q(w|h)=0 for N(h,w)=1, we obtain the iteration formulae [Ney et al. (1994)]:


 eqnarray10408

Similarly, we have for the tex2html_wrap_inline45703:


eqnarray10423

Again, the dominating effect of the singletons is evident. For the special choice tex2html_wrap_inline46349 if N(h,w)>0, we have linear discounting  again and recover the corresponding equations.

The model of linear interpolation considered so far is special because it involves only two types of histories, namely h and tex2html_wrap_inline45681. In general, we may have several generalised histories tex2html_wrap_inline46357 for i=1,2,.... For this general case of linear interpolation, the so-called EM algorithm  provides an efficient iterative procedure for estimating the unknown parameters [Baum (1972), Dempster et al. (1977), Jelinek & Mercer (1980)]. To consider the details of the EM algorithm , we have to specify the full interpolation model:
eqnarray10450
For the interpolation to work, the interpolation parameter tex2html_wrap_inline46361 must explicitly depend on the index i of the generalised history. The framework of the EM algorithm  is based on the so-called tex2html_wrap_inline46365 function, where tex2html_wrap_inline46367 is the new estimate obtained from the previous estimate tex2html_wrap_inline44831 [Baum (1972), Dempster et al. (1977)]. The parameter tex2html_wrap_inline44831 stands for the whole set of parameters to be estimated. The tex2html_wrap_inline46365 function is an extension of the usual log-likelihood function and is for our model:


eqnarray10459

Taking the partial derivatives with respect to tex2html_wrap_inline46375 and observing the normalisation constraint, we obtain the equation:
eqnarray10474
In this form, the interpolation parameters tex2html_wrap_inline46375 depend on the history h the number of which can be large. Therefore, some sort of tying  might often be useful [Jelinek & Mercer (1980)].

Taking the partial derivatives with respect to tex2html_wrap_inline46381 and observing the normalisation constraint, we obtain the equation:
eqnarray10489
   



next up previous contents index
Next: Absolute discounting and backing-off Up: Final note: the mathematics Previous: Linear discounting and backing-off

EAGLES SWLG SoftEdition, May 1997. Get the book...