next up previous contents index
Next: Dialogue metrics Up: Interactive dialogue systems Previous: Interactive dialogue systems

Wizard of Oz (WOZ)


A description of WOZ has been included in Chapters 4 and 13. Here it is intended to ask what this technique is useful for. The simple answer is setting up simulation of a dialogue system which will allow testing implications of a required system without committing the investigator to its actual implementation. Itself it is an experimental procedure. The advantage is its flexibility in comparison with a computational implementation. WOZ design brings the engineer and ergonomist together to realise a complex task. It should be borne in mind that WOZ is a ``means to an end'' not ``an end''. Ultimately, if WOZ has done the job expected of it, it will be discarded and a system implemented. The expectations of both engineer and ergonomist need to be clearly specified at the outset.

The engineer expects the WOZ simulation to provide answers as to what the structure of the dialogue system should eventually look like and should therefore be expected to provide certain information necessary to get the ergonomist started. In particular, the ergonomist needs a model for the language used in the dialogue system and on-going advice about any proposed changes. A potential problem at the outset is whether this initial specification of language is seen by the engineer as the core of the work. After initial experimentation the ergonomist will need to go back and request ways in which this and other aspects of the simulation require alteration by the language engineer.

It is necessary to realise that for WOZ to be useful it has to be cheaper than the direct implementation of a system. It should also be a more efficient tool given the comparative flexibility it offers. The engineer should make the ergonomist aware of exactly what time is allowed for development and testing , what call they can have on the engineer's time and expertise, etc. A ``searching in the dark'' strategy on the part of the ergonomist is an unsatisfactory situation, and an adequate procedure will need to be carefully negotiated.

The aim will be to attempt to come up with some experimental procedures that will allow the engineer and ergonomist to achieve their ultimate goals. The major practical requirement of WOZ is that it provides answers or proposals about what the ultimate system should be like. The major methodological question is somewhat different: How good is the simulation? Statistical and experimental procedures will be focussed on that.

Audio-only simulations are described, followed by some brief comments on current developments of multimodal systems.

Audio-only simulations


The language engineer should provide a description of the user dialogue which supports activities for the proposed task. A representation such as a state transition diagram, would be appropriate. A second requirement is a performance specification, i.e. a description of exactly what the device is supposed to do.

Once provided with these basics, the ergonomist needs to establish what sort of factors might limit subjects in their ability to work on the assumption that the dialogue is with a machine and, correspondingly, what sorts of factors in the task (the job of the wizard) are likely to facilitate or prevent this pretext.


Subject variables

Care must be taken to ensure that the observations are made under conditions which will elicit representative performance: the vocabulary,  device users and operating environment should be as similar as possible to the device being simulated. This includes making errors similar to those which might occur in the actual device.

A typical set of instructions to the people trying to create the simulation might be as follows:

You are required to transcribe all utterances of the user subject onto the computer using the keyboard. Speak aloud what is displayed by the computer, and your utterances will be transmitted back to a subject. Be careful to read out what appears on the screen, and not just repeat what the subject said. Although your speech will be distorted to make it sound like synthesised speech,  try to minimise the inflections in your voice and to speak as consistently as possible, in order to enhance the ``mechanical'' effect.



Wizard variables

The requirements associated with the wizard are really those of a good experimental procedure (Section 9.3). The output should be consistent in content, style and pace. Two examples are:

Since the job of being a wizard is not easy, wizards may need to be trained  to produce predefined replies or menus, etc.

A factors likely to determine variability in wizards is the level of skill exhibited by the system subject, which will include fatigue and individual differences in aptitude. The first factor may be controlled by recruiting wizards who are likely candidates for developing these skills, and by training. Individual differences will be eliminated if only one wizard is employed; however, this advantage may be reduced if the study is large in scope (see section on multimodal simulations).

Since cognitive load is high, two wizard configurations are used in recent studies: one performs the I/O (receives the questions and generates the answers), the second performs ``task level processing'' (generates the answers to be formulated by I/O wizard). It is considered that the two-wizard setup is more likely to achieve consistency and not increase response time , though these claims need verifying experimentally.

A final important recommendation is that there should be a permanent record of performance. To this end, questions and answers should be tape-recorded.  


Future interactive systems may require input from more than one modality: Examples would be speech input to generate visual text or voice operated drawing programmes. When WOZ techniques are employed for these applications, the extra factors that need to be considered are:

TASK COMPLEXITY: The more modalities, the more functions need to be simulated.
INFORMATION BANDWIDTH: There are many ways of providing input. In addition, the input load may be too high for a single wizard, resulting in his behaviour becoming inconsistent. In such cases, multiple wizard configurations would be essential.
MULTI-WIZARD CONFIGURATIONS: The need for multi-wizard configurations results in issues about how to organise collaboration between them. Workload must ideally be spread equally. This is difficult since it relies on the subjects' behaviour and, thus, the roles of the wizards may need to change dynamically. For this reason a third (supervisory) wizard may be needed.

Coutaz and associates have spent time developing recommendations for wizard collaboration. They structure a wizard's task in three steps:
  1. Acquisition (analysis of message).
  2. Interpretation what the subject's response means in connection with the task faced.
  3. Formulation (the emission of an answer).

next up previous contents index
Next: Dialogue metrics Up: Interactive dialogue systems Previous: Interactive dialogue systems

EAGLES SWLG SoftEdition, May 1997. Get the book...