next up previous contents index
Next: The full trigram model Up: Multilevel smoothing for trigram Previous: Multilevel smoothing for trigram

Problem formulation

When smoothing a trigram model with a bigram  model, we have to keep in mind that the backing-off  distribution itself requires smoothing. So the bigram  itself is smoothed by a unigram  which again may be smoothed by a zerogram . Thus, we can define the following levels for a trigram event (u,v,w):

It is helpful to explicitly write down the notation used in the following, in particular the definitions of the so-called singletons and the unseen events: 

N(u,v,w): number of observations for trigram uvw;
tex2html_wrap_inline45781: number of observations for bigram  uv;
tex2html_wrap_inline45785: number of unseen trigrams starting with uv;
tex2html_wrap_inline45789: number of trigram singletons ending in vw;
tex2html_wrap_inline45793: number of trigram singletons having v in the middle.

The definitions at the bigram  and unigram  level are similar:

tex2html_wrap_inline45797: number of observations for unigram u;
tex2html_wrap_inline45801: number of unseen bigrams  starting with v;
tex2html_wrap_inline45805: number of unseen unigrams. 

EAGLES SWLG SoftEdition, May 1997. Get the book...