In smoothing trigram models by linear discounting or interpolation ,
the smoothing parameters,
both and , have been assumed
to depend on the word pair *uv*.
The number of these word pairs itself is huge so that
reducing the number of smoothing parameters is desirable.
Often, the parameters are *pooled or tied*
across different histories by setting [Jelinek & Mercer (1980)]:

This means that the parameters are tied across histories *h* with
the same count *N*(*h*).
It is straightforward to repeat the derivations
for this type of tying.
As result, we obtain:

Similarly, when assuming the parameters to
be *independent* of the histories *h*, we obtain:

For absolute discounting , we obtain similar formulae in
the case of tying.
In particular for absolute discounting , the experimental results
show that there is no degradation in perplexity when
using history *independent* discounting parameters [Ney & Essen (1993)].

In a real implementation, no matter whether for off-line purposes or in an operational prototype system, the computational complexity of the trigram or bigram model has to be considered, namely the memory requirements and the access time.

- To store the trigram and bigram counts, we cannot use a
standard array implementation. Even for a simple model like
a bigram model, it is impossible to store
the full table of conditional probabilities, which
would require entries for
a vocabulary of 20000 words.
Instead, often a so-called list implementation is used
which works as follows.
For each word
*w*, we have a pointer into the actual list, and for each observed trigram (*u*,*v*,*w*), an entry of the list consists of two parts, namely the index pair (*u*,*v*) and the trigram count*N*(*u*,*v*,*w*). For efficiency, this organisation should be combined with the counts for the bigram models. A further reduction in memory costs is achieved by removing the*singletons*[Katz (1987)] from the list of seen bigrams and trigrams. In particular, by simply removing the*trigram singletons*, the storage costs can be reduced dramatically without affecting the perplexity much. We give a specific example for the WSJ corpus of 39 million running words, for which experimental results will be described. Here, the storage cost is 126 Mbyte when 4-byte values are used for all pointers and counts. This storage cost can be reduced down to 66 Mbyte by omitting the trigram singletons. - For fast access to the bigram and trigram lists, we have to use suitable techniques, e.g. binary search . Short term storing of language probabilities might also be appropriate, depending on the interaction between language model and search .