Users of this handbook are likely to have needs which fall into three distinct categories. First, end user organisations will wish to compare existing interactive dialogue systems in order to select the best solution for some particular purpose. Second, system engineers will wish to gauge the performance of existing systems for diagnostic purposes, in order to improve their performance. Third, system designers will wish to learn how to go about designing an interactive dialogue system from scratch. These three needs are addressed below.
It is notoriously difficult to compare existing dialogue systems, as they really ought to be compared in exactly the same conditions. In fact, comparison depends on the degree of system integration. Dialogue managers (as distinct from dialogue systems) should ideally be evaluated independently of the speech technology (recogniser and synthesiser ) and of the application domain. In fact, this is rarely possible: for instance, dialogue prediction and correction procedures are heavily dependent on the recogniser performance and its linguistic analysis components. A dialogue system is also rarely completely independent of the application domain or, at least, of a class of applications. Even for the same application, the interface might be different (for air-traffic control training , for instance, there exist different air-traffic simulators with different levels of complexity). Interfaces between the system and the speech technologies on the one hand, and the system and application on the other are not at present general-purpose. Adaptation is always necessary. Complete systems developed for the same application domain could however be assessed on corpora of similar complexity, corresponding to the same pre-defined scenarios , but as they have different internal architectures, with different actual components (which need not coincide with abstract components), only a black box assessment might be envisaged.
This chapter aims to make these issues accessible to people who may lack extensive experience in speech and language technology and who wish to compare existing systems.
Improving existing systems may aim either to improve the system performance (overall or parts of the system), or to render the system more independent of either the speech technologies or the application.
By outlining a framework for testing , respecifying and enhancing systems, this chapter provides a way into this complex problem.
A background activity to designing new systems is to try to assess existing systems to the limit of their possibilities (maximum number of words, for instance), assigning limit values to their variables (vocabulary size , number of semantic frames or concepts, etc.). The results of the evaluation of systems which deal with similar tasks will also be of considerable relevance here. Besides, designing new systems assumes that several analyses have been done beforehand, based on the following procedures:
These, and other related tasks, are explained in this chapter in the context of system specification and design, along with some detailed procedures for progressing from an initial goal to a final working system (see also Chapter 2).
It is important to understand exactly the nature of the technology with which this chapter is concerned, and to master the technical terms which will crop up again and again throughout the chapter. These needs are addressed in Section 13.2, on Interactive dialogue systems.
Interactive dialogue systems are highly complex systems, incorporating many different technologies. Section 13.3, Specification and design, reviews some of the approaches which have been adopted to the problem of specifying and designing such systems. This section concentrates on specifying the functionality of interactive dialogue systems. Detailed recommendations based on practical experiences of workers in the field are included.
Once an interactive dialogue system has been specified, designed and implemented, the task of assessing how well the system performs is a non-trivial task. Section 13.4, Evaluation, looks at what makes the problem difficult, describes a framework within which evaluation may take place, and suggests a core set of metrics which can be used for comparing different interactive dialogue systems.