Few complete interactive dialogue systems have been systematically evaluated. Recommendations will primarily be based on evaluation tests which have already been performed over the past few years, most of which concern rather simple tasks.
The lack of systematic evaluation lies in the fact that few interactive spoken language dialogue systems have yet been in real use, but also in the fact that there does not exist any stable categorisation of the basic units used in dialogue systems which might constitute a reference, as opposed to other systems (for instance, in electronic dictionaries, basic units - lexical entries - are assigned grammatical categories and other characteristic features). For dialogue systems, there has been no such definitive categorisation of dialogue acts, nor, for that matter, has there been any definitive understanding of ``dialogue grammar '', though state machines for turn sequencing are widely used. Furthermore, a dialogue system encompasses several different levels, each of which consists of different components, making an overall evaluation more complex.
The set of recommendations contained in this chapter should therefore be seen as provisional, awaiting further refinement and extension as understanding in the area of interactive dialogue systems grows.