Dialogue systems (see also Chapter 13) include aspects of speech recognition and synthesis. Therefore, once again, much of what has been said before is relevant to this topic. Similarly, decisions about trial sizes can be made on the basis of the statistical information provided previously.
Consultations with the interactive systems group raised two main questions in addition to the issues associated with recognition and synthesis: (1) The need for some statistical appraisal of dialogue simulation systems (so-called Wizard of Oz (WOZ) simulations ). In particular, how can these provide answers that have been properly tested, and what mechanisms are there for the introduction of these techniques into actual systems? (2) What sort of dialogue metrics are available that could be used for assessing systems?
It is easy to pose these questions but not so easy to provide answers. There really are not clear cut answers for either of these, as researchers involved in these activities have to answer more fundamental questions: with respect to the first question, the research literature has been more concerned with how to set up appropriate methodologies for WOZ rather than using them to deliver definite proposals about a specific system and why. Proposals for specific systems tend to be the subject of internal reports or in conference proceedings rather than in the journals and, so, are hard to get hold of. Research on the second question has been mainly in the realm of qualitative assessments of limited amounts of material analysed manually. Though a handbook on Language Engineering Standards should include advice on the issues enumerated initially, research is not at a stage that it can provide many answers. Rather than say nothing about these issues, we provide a selection of the available information on WOZ and some work which might make dialogue metrics more quantitative.