Next: The full trigram model
Up: Multilevel smoothing for trigram
Previous: Multilevel smoothing for trigram
When smoothing a trigram model with a bigram model, we have to keep in mind
that the backing-off distribution
itself requires smoothing.
So the bigram itself is smoothed by a unigram
which again may be
smoothed by a zerogram .
Thus, we can define the following levels for a trigram event
(u,v,w):
- the trigram level , which defines
the relative trigram frequencies as the level to start with;
- the bigram level ;
- the unigram level ;
- the zerogram level
if the unigram estimates are unreliable.
It is helpful to explicitly write down the notation
used in the following, in particular
the definitions of the so-called singletons
and the unseen events:
-
- N(u,v,w): number of observations for trigram uvw;
-
- : number of observations for bigram uv;
-
- : number of unseen trigrams starting with uv;
-
- : number of trigram singletons ending in vw;
-
- : number of trigram singletons
having v in the middle.
The definitions at the bigram and unigram level are similar:
-
- : number of observations for unigram u;
-
- : number of unseen bigrams starting with v;
-
- : number of unseen unigrams.
EAGLES SWLG SoftEdition, May 1997. Get the book...