Definitions

Next: Specification and design Up: Interactive dialogue systems Previous: Interactive dialogue systems

Definitions

An interactive dialogue system is constructed such that it enables and supports the communication between a human user and the service offered by the system. It is based on the integration of a set of modules, each of which handles a complex task. The modules are linked to each other and their interactions are controlled by a kernel module which has the overall task of managing the dialogue. Seen from the dialogue manager, the application functions as an external module (e.g. a remotely functioning database) connected to a human user who may have a number of input and output devices at his disposal.

A dialogue manager may be able to handle several input and output devices in parallel: a user may interact with the dialogue system using multimodal input and output, and several input devices may be used in transferring the same message to the system, for example, DTMF (touch tones ) instead of speech input.

Users communicate with the system in a number of transactions. A transaction consists of a number of exchanges , each of which consists of the input utterance (or a sequence of DTMF signals for a touch tone input device) or the corresponding system response (e.g. synthetic or canned speech or text on a screen). The attention of the interactive dialogue system changes in a sequence of turns .

A number of basic terms from interactive dialogue are introduced here:

Interaction:: Communication of information between two agents (parties), each of whom is capable of actions . A signal or a stimulus (for example, an utterance in a linguistic interaction) coming from one agent provokes a change in the internal state(s) of, and a response(s) from, the other agent . This process is symmetrical between both parties. Of special relevance here is the case where one of the agents is a human being and the other is an automatic system. Stimuli are understood in a broad sense, including multimodal stimuli (in which different media may be used). They may, for example, consist of a physical action (moving a pointer) or a linguistic act (uttering a sentence) or the coordination of both.
Spoken language dialogue:: Also known as oral dialogue. A complete spoken verbal interaction between two parties (in the present case, a system and a human being), each of whom is capable of independent actions . A dialogue is composed of a sequence of steps which are related and build on each other. Dialogue systems are more sophisticated than question/answer systems, in which one agent may pose a succession of unrelated queries to the other agent .
Task-oriented dialogue:: A dialogue concerning a specific subject, aiming at an explicit goal (such as resolving a problem or obtaining specific information). For example, dialogues concerned with obtaining travel information or booking theatre tickets are task-oriented .
Transaction:: The part of a dialogue devoted to a single high-level task (for example, making a travel booking or checking a bank account balance). A transaction may be coextensive with a dialogue, or a dialogue may consist of more than one transaction.
Turn :: A stretch of speech, spoken by one party in a dialogue. A stretch of speech may contain several linguistic acts or actions . A dialogue consists of a sequence of turns produced alternately by each party . Turns are also known as utterances.
Exchange :: A pair of contiguous and related turns, one spoken by each party in the dialogue.
Wizard-of-Oz simulation:: Simulation of the behaviour of an interactive automaton by a human being. This can be done (i) by speaking to the user in a disguised or synthetic voice, (ii) by choosing and triggering system predefined responses, (iii) by manually modifying some parameters of the simulation system, or (iv) by using a person to simulate the integration of existing system components, i.e. a bionic Wizard-of-Oz simulation (see also Chapters 4 and 9).

Now that these basic terms have been defined, we shall consider how interactive dialogue systems compare with command systems, and shall review some issues relating to the different levels of interactive complexity to be found in dialogue systems.

Interactive dialogue systems compared to command systems

In command systems, the interaction is direct and deterministic: to one stimulus from one agent corresponds one unique response from the other agent , the response being independent of the state or context of each agent . For example, you press a key on a keyboard and the expected character appears on the screen. With command systems, the human has direct control over the machine. This form, not normally considered as a variety of human communication, is usually referred to as the tool metaphor.

A dialogue system can be considered as a kind of interface which performs communication between a human being and an application system, which may include several other systems. The dialogue system must process two kinds of information: that coming from the user and that coming from the task itself through specialised interfaces, one for the speech technologies, one for the application. One of the dialogue system's main activities is to maintain coherence between both. Therefore, the connection between a human being's action (a natural language utterance, for instance) and the response of the system is not direct: the dialogue system must achieve a number of internal actions in order to give a response which is not unique but depends on the internal state of the system and on the context of the interaction. This form of communication is referred to as the agent metaphor or the advisor metaphor.

Interactive complexity of dialogue systems

Dialogue systems include different comprehension levels relating to basic components: a recogniser , a parser , an interpretation module, a dialogue manager , a synthesiser, etc. Each of the modules requires associated knowledge databases (lexicons, rules and models concerning the language used, the system, the task, the user, the environment, the dialogue itself). Each of the models has both static and dynamic parts: the static part exists before the dialogue begins, the dynamic part is built and modified during dialogue. One important component is the dialogue history which keeps track of the previous exchanges. The different modules and their associated knowledge bases allow the dialogue manager (or system) to perform internal actions including the following:

Verify the coherence of the user's request with the system.
Knowledge concerning the linguistic analysis and generation modules, the task, etc.
Negotiate with the user.
Resolve problems of reference (anaphoras , ellipses, etc.).
Generate a relevant response.
Draw reasonable inferences.
Predict the user's most probable reaction.

The different comprehension levels involved (acoustic, phonetic, lexical, syntactic, semantico-pragmatic ) may be addressed sequentially. Alternatively, information transfers may take place in parallel between different levels in a non-hierarchical fashion, depending on the dialogue situation.

The role and performance of the dialogue system are largely constrained by and therefore dependent on the performance of speech technologies (depending on the recogniser error rate and authorised vocabulary, or on the control parameters of the synthesiser, for instance). They are also greatly dependent on the task objectives and requirements.

Different interactive complexity levels in dialogue systems may be identified. These are described in the following sections.

Menu dialogue systems

The interaction is reduced to a question-answer user-interface. The dialogue model is merged into the task model from which it cannot be distinguished. Dialogues of this kind are often represented by branching tree structures. This category includes interactive voice response (IVR ) systems, integrating tone signalling , isolated word recognition and word spotting techniques. The dialogue is strictly guided, leaving very little initiative to the user (system utterances may in some cases be interrupted by the user, for example). Several exchanges may be necessary to provoke one action or to obtain information from the system. This latter feature distinguishes these systems from pure voice control or command language systems in which there is no dialogue.

A question/answer system is a particular limiting case, as it may either be considered as a command system or as a marginal dialogue system: if one particular question always provokes the same response whatever the situation, then the system may be considered as a command system. But if asking the same question can provoke different responses (in menu-driven dialogue systems, for instance, it may depend on the current level in a tree structure), then the system can be called an interactive dialogue system.

Spoken language dialogue systems

The system possesses distinct and independent models for the task, for the user, for the system, and for the dialogue itself. The dialogue model takes context into account, using a particular knowledge base (a dialogue history ), which is built during dialogue. Multiple types of references (anaphora , ellipses), may be processed. The system may be capable of reasoning, of error or incoherence detection and internal correction, and of anticipation and prediction.

Multimodal dialogue systems including speech

In this case, the complexity of the spoken language dialogue is compounded by the fact that the result of speech recognition has to be merged with other information delivered by other means of communication (media). The dialogue is itself dependent on the system model. Each piece of information delivered by a medium must be dated, as each medium does not process information in the same time, and the dialogue manager has to take event chronology into account.

The first category of systems (menu systems) is now used in several real-world application domains (enquiries about cinema programmes, travel timetables, bank accounts, etc.). Most applications deployed in the field work over the telephone and are used by the general public. Members of the two other categories are mostly still industrial and laboratory prototypes , which still impose a lot of constraints (such as a training phase , and a quiet environment ) on the user. However, this position is steadily changing as more advanced interactive systems come to be deployed in the field.

Next: Specification and design Up: Interactive dialogue systems Previous: Interactive dialogue systems

EAGLES SWLG SoftEdition, May 1997. Get the book...