next up previous contents index
Next: Experimental results Up: Multilevel smoothing for trigram Previous: Practical issues

Cache

 

The so-called cache model has been used successfully by a number of researchers [Kuhn & De Mori (1990), Jelinek et al. (1991b), Rosenfeld (1994)]. The cache can be viewed as a short-term memory where the probability of the most recent words is increased. In other words, the cache model takes into account that the words of the vocabulary  are not distributed homogeneously over a text, but tend to occur in clusters. The typical mathematical formulation for the cache contribution is as follows:
eqnarray9342
where tex2html_wrap_inline45831 denotes the Kronecker function, which is 1 if the two arguments are the same and 0 otherwise. The probability of the cache model is typically combined with the trigram model by linear interpolation  . There are refinements that suggest themselves:

The cache concept considered so far is based on unigrams  only. As in the case of unigrams , we can argue that word bigrams   and trigrams tend to occur in clusters, too. Extensions of the unigram  cache to bigrams  and/or trigrams have been successfully used in [Jelinek et al. (1991b)] and [Rosenfeld (1994)]. For example, in the case of a bigram cache, the bigram  counts based on the most recent history are used to compute the probabilities for the bigram  cache. The cache model described here can be interpreted as a special case of so-called adaptive language models that adapt their probabilities to the most recent history, say the last 100 to 1000 predecessor words. In contrast, a non-adaptive language model does not depend on the test data , but remains unchanged as trained on the training data . For other types of adaptive language models see [Essen & Steinbiss (1992)] and [Rosenfeld (1994)].

 



next up previous contents index
Next: Experimental results Up: Multilevel smoothing for trigram Previous: Practical issues

EAGLES SWLG SoftEdition, May 1997. Get the book...