Iterative design methodology for spoken language dialogue systems

Next: Readings in interactive dialogue Up: Specification and design Previous: Design by simulation

Iterative design methodology for spoken language dialogue systems

Most of the spoken language dialogue systems which have been created so far (SUNDIAL, VODIS, PAROLE, etc.) have used analysis of real dialogues and simulated dialogues before proceeding to implement a system. These data have, of course, been augmented by designers' intuitions to fill genuine gaps in the data. For example, observation and simulation corpora in the travel information domain might not include mention of all the destinations contained in the timetable. The design should not be so tied to the data that these deficits can not be simply rectified. However, caution should be exercised in the use of intuitions so as not to equip the system with functionality which it will never need. Experience has shown that the expectation that some linguistic form might occur is not in itself sufficient grounds for supposing that it will occur.

Normal practice is to design several sequential versions of the system, each version benefitting from technology improvements and from analysis of results of earlier stages.

Interactive voice response systems: recommendations

Recommendations on design methodology

Designing a simple system-led menu-style small vocabulary interactive voice response system consists of the following steps, taking both human linguistic behaviour and speech technology performance into account.

Study the application domain and define what the tasks to be achieved are and what steps they consist of.
Translate the sequence of subtasks into a sequence of questions to be asked by the system and answered by the user, interleaved where necessary with system internal operations such as database lookup.
Define the exact wording of the system prompts , and the exact vocabularies and language models which are appropriate for each recognition.
Draw up a full specification of the IVR system, integrating the dialogue flow, system-internal operations, prompting and recognition constraints.
Design a first version (X) of the dialogue system.
Conduct laboratory tests with available technology using test corpora where available, and also laboratory staff simulating users.
Conduct field trials with real users, recording new corpora where deemed useful.
``Tune'' the system by iteratively modifying, then testing it.
If too many modifications are necessary, respecify and reimplement the system.
Design an X+1 version of the system, integrating new technologies.
Carry out new laboratory tests with the new version.
Carry out field trials with real users.
Return to step 9 unless the system is deemed to be complete.

Recommendations on prompt design

Prompt design is especially important for IVR systems. Since the user has to follow the system's lead, that lead must be clear, unambiguous, and reassuring. The following recommendations summarise some simple steps which can be taken to achieve an effective prompting regime.

Keep prompts as brief as possible without being terse.
Keep prompts as simple as possible.
Use a consistent linguistic style for prompts.
Ensure that each prompt (except the last) finishes with an explicit question or command.
Wherever technically possible, allow users to interrupt the prompt.
Where prompt interruption is not possible, ensure that either the recogniser starts listening the instant the prompt stops playing, or use some audible signal to indicate when speech may begin.
If prompts are canned , either use a single speaker or, if more than one is used, ensure that each speaker serves an intuitively distinct function.
Do not expect instructions presented to the user at the start of a dialogue to be remembered in subsequent turns.
Wherever possible, re-promptings after errors or absence of user input should provide extra guidance to help the user behave in the desired fashion.
Control variables such as prompt voice quality to give the system a warm and friendly system ``personality''.

Spoken language dialogue systems: recommendations

Recommendations on design methodology

Designing a spoken language dialogue system consists of the following steps, taking both human linguistic behaviour and speech technology performance into account.

Study human-human interaction recordings in a situation similar to the one in which the system will be used, and make an ergonomic analysis of the needs or requirements of potential users.
Carefully define a Wizard-of-Oz simulation, making objectives explicit.
Conduct Wizard-of-Oz simulations (preferably using an iterative WOZ methodology) and record the complete resulting dialogues.
Transcribe the dialogues recorded in simulations, (several levels of transcriptions may be necessary). If possible use a standard transcription scheme.
Draw up a specification of the interactive dialogue system.
Design and implement a first version (X) of the dialogue system.
Conduct laboratory tests with available technology using corpora recorded in Wizard-of-Oz simulations, and then with laboratory staff simulating users, recording new data.
Conduct field tests with real users, recording new corpora.
``Tune'' the system by iteratively modifying, then testing it.
If too many modifications are necessary, carry out new (bionic or human) Wizard-of-Oz experiments, allowing for controlling of different parameters.
Design and implement an X+1 version of the system, integrating new technologies.
Carry out new laboratory tests with the new version.
Carry out field tests with real users.
Return to step 9 unless the system is deemed to be complete.

Additional recommendations

In addition to these methodological guidelines, the spoken language dialogue specification/design process can be expected to be simplified and improved if a few extra recommendations are adhered to. (Many of these summarise points already made in the preceding discussion.)

Where time and other resources allow, base the specification on data from a diversity of sources.
Consult human-human data to learn about the task and to understand the service expectations which users will bring to the system.
Conduct WOZ simulations to determine the effect of human-computer factors for a specific task or application domain.
Use native speaker intuitions to fill obvious gaps in the human-human and WOZ corpora, but avoid going beyond this.
Use an iterative refinement methodology to perfect the specification.
Allow sufficient time and resources for the specification process.
Decide in advance which questions to ask of the data, and wherever possible stick to them.
Conduct a dialogue act analysis of the dialogues collected in the corpora, paying special attention to the conditions which must be satisfied in order to proceed from one dialogue state to the next.
Describe the dialogue state transitions using some formally explicit apparatus (such as a flowchart or formal specification language).
Use the data to identify the total lexicon required, then divide it into sublexicons, where each sublexicon is accociated with a dialogue act.
Use the data to identify a covering grammar , then divide it into subgrammars , where each subgrammar is accociated with a dialogue act.

Human reactions to spoken language dialogue systems have to be observed on the spot. The ideal approach is therefore to design systems in close collaboration with professional organisations which have groups of potential users who are willing to critique specification documents, participate in early trials, and feed back useful comments.

Next: Readings in interactive dialogue Up: Specification and design Previous: Design by simulation

EAGLES SWLG SoftEdition, May 1997. Get the book...