What we have considered so far could be called word-based perplexity because the smallest units are the words of the language. This word based perplexity can be extended in two directions. First, we can go from written words to letters (characters) and define a letter based perplexity. The advantage then is that the set of symbols is guaranteed to be closed because it is simply the alphabet, i.e. the set of characters, and at the same time, the vocabulary is unlimited because any word can be made up from the symbols of the alphabet. The additional complication, however, is that the constraints of the spelling dictionary must be taken into account in addition to the usual language model constraints.
The second direction could be to go one step further and consider the set of phonemes as set of basic units. In this case, the pronunciation constraints (or in its more general form the phonotactic constraints of the language) would be taken into account in the perplexity definition. Such a phoneme based perplexity could measure all the constraints that are considered to be ``prior'' to the observed acoustic signal. For example, it is a well known fact that the difficulty of a recognition task depends very much on the phonetic similarities of the words to be recognised. In particular, the lengths of the words to be recognised play an important role. As a first approximation, this could be taken into account by normalising word based perplexity with respect to the number of phonemes per spoken word. However, even in today's research systems, we are still limited to the word based perplexity.