Next: The full trigram model
Up: Multilevel smoothing for trigram 
 Previous: Multilevel smoothing for trigram 
When smoothing a trigram model with a bigram  model, we have to keep in mind
that the backing-off  distribution
itself requires smoothing.
So the bigram  itself is smoothed by a unigram  
which again may be
smoothed by a zerogram .
Thus, we can define the following levels for a trigram event
(u,v,w):
-  the trigram level 
, which defines
         the relative trigram frequencies as the level to start with;
 -  the bigram  level  
;
 -  the unigram  level 
;
 -  the zerogram  level 
 
         if the unigram estimates are unreliable.
 
It is helpful to explicitly write down the notation
used in the following, in particular
the definitions of the so-called singletons
and the unseen events: 
- 
 -  N(u,v,w): number of observations for trigram uvw; 
 - 
 -  
: number of observations for bigram  uv; 
 - 
 -  
: number of unseen trigrams starting with uv; 
 - 
 -  
: number of trigram singletons ending in vw;
 - 
 -  
: number of trigram singletons 
                              having v in the middle.
 
The definitions at the bigram  and unigram  level are similar:
- 
 -  
: number of observations for unigram u; 
 - 
 -  
: number of unseen bigrams  starting with v;
 - 
 -  
: number of unseen unigrams. 
 
 
 
 
EAGLES SWLG SoftEdition, May 1997. Get the book...