Characterisation

Next: Assessment framework Up: Evaluation Previous: Background

Characterisation

Characterisation is a very important part of the system evaluation process. It is vital to define precisely what it is that is being evaluated, together with all the conditions under which the evaluation takes place. We therefore devote attention here to characterising the dialogue system, the task, the user, the environment , the corpus, and the overall system.

Dialogue system characterisation

Dialogue systems may be characterised by several parameters, defining their complexity; these are listed in appropriate categories below.

Knowledge databases

Language model:: the model used by the recogniser and shared by the system in order to guide the recognition process, whenever necessary.
Recogniser complexity:: isolated word, word spotting, continuous speech (error-free read text or spontaneous with repairs, hesitations, ill-formed or incomplete sentences).
Lexicon:: the list of allowed words.
Phonological rules:: rules of pronunciation of the words.
Syntactic rules:: descriptions of the well-formed linguistic constructions recognised by the system.
Semantic-pragmatic representation :: the list of concepts used with associated structures (frames, conceptual graphs, etc.).
Task model:: plans and scenarios , goals and subgoals, representation of the objects of the task and of their characteristics. (For example, in an air-traffic control domain, known objects would include planes with parameters such as heading, level, etc., and known goals would include authorising route adjustments, etc.).
Dialogue grammar :: hierarchy of subdialogues with dialogue act leaf nodes , indicating permitted progression from one turn type to the next.
User model:: particular rules of pronunciation (confusion matrix ), linguistic behaviour (particular formulations), user beliefs and knowledge about the task, etc.
System model:: the list of the media available, with a description of their characteristics.

Dialogue strategies

The following are alternative classes of dialogue strategy which may be adopted. Considerable scope exists for further subclassification here.

Strictly guided and deterministic :: no initiative left to the user. IVR systems typically fall into this category.
Cooperative:: includes correction and prediction mechanisms, shares initiative with the user, accepts interruptions or negotiation, capable of clarifying the system's choices and responses, (turn -taking is balanced between the user and the system)
Constitutive:: (for educational systems) the system has to learn new notions in its normal operation.
Adaptive:: takes into account the dynamic user model by learning the users communicative strategies and adjusting to them as each dialogue proceeds.

Task characterisation

There is an intimate connection between the application domain and associated tasks, the language required to accomplish these tasks in this domain, and the design of a system which supports dialogues for this purpose.

Task type

A broad categorisation of task types can be made, depending on whether the objects of the task are evolving during the dialogue or not. These include the following:

Information access and retrieval:: for example, train or flight time table enquiries.
Negotiation:: the system acts as an expert-system, trying to find the best solution, for instance the best way to assign conference delegates to hotels, taking hotel costs and proximity to the conference centre into account. (An information retrieval system may need some kind of negotiation, for example, to obtain a less expensive travel ticket).
Process control:: the task is evolving, as for instance in communication with a robot.
Training:: knowledge acquisition by the user or by the machine. In such cases as air-traffic control training , the task may be evolving (planes are changing heading or level).
Monitoring:: the system does not play an active part in the dialogue, but monitors its progress and is available to offer assistance when called upon. Such systems are sometimes referred to as computer mediated (or supported or assisted) human-human communication systems. The most notable example of such a system is the VERBMOBIL face-to-face spoken language translation system, currently under development.

Task complexity

Tasks and the dialogues by which these tasks are achieved are more or less mutually dependent. Typically, simple tasks will be solved by means of simple dialogues and complex tasks will be accomplished by means of complex dialogues. Thus an important part of the characterisation of the dialogue system is an index of the complexity of the task or tasks to be addressed by the system. Such indices might include the following subcomponents.

the number of different scenarios covered (i.e. does the system address just one kind of problem or many different kinds?);
the maximum complexity (i.e. in each scenario , what is the maximum number of subgoals which have to be satisfied in order to solve the task problem?). The complexity may be measured by the depth or width of the hierarchy, if the task and subtasks can be represented by a tree structure;
number of subtasks to be achieved in parallel (especially in multimodal interaction);
the minimum number of exchanges necessary to solve the task problem or complete a plan.

User characterisation

A large number of different criteria must be taken into account when characterising users of interactive dialogue systems. At least the following must be considered:

number of users, for example, a few (10), or numerous (thousands);
age (children, adults). The following age bands are adequate for most assessment purposes: less than 18, 18-25, 25-35, 35-45, 45-55, 55-65, over 65;
sex (female/male);
experience in the use of the automatic system (trained or untrained, experienced or novice, occasional or regular users);
expertise in the application domain (the user knows what information he wants or not);
status (professionals or members of the general public );
motivation (whether they are real end-users, or paid or unpaid subjects);
physical status (stressed , tired, ill, ...); when stressed due to adverse environments (for instance, in a space shuttle), the user might be affected by vibrations, temperature, G-effect , urgency, etc. in his pronunciation and utterance structure, and in his way of conducting the dialogue (he may wish to complete the task very quickly, for instance).

Environment characterisation

The environment is the total context in which an interactive dialogue system is evaluated. In general, it would be impossible to produce an exhaustive description of an environment , but a restricted set of relevant features may be selected usefully. For example, the following features relating to the acoustic environment might be used.

type and proximity of the microphone used by the recogniser (for example, high quality microphone , close-talking microphone, microphone array , telephone handset, hands-free telephone, etc.);
level of background noise (anechoic chamber/ office/ street/ car/ factory);
communication quality (telephone lines: analog/digital).

Other relevant environment features will also have to be developed. As a general rule, the more features of the environment which can be characterised, the better.

Result corpus characterisation

A system evaluation will result in the collection of a corpus of resulting dialogues. Corpora must be fully characterised to ensure that changes in system performance over time can be tracked, and that corpora collected using different systems can be reliably compared.

At least the following features of an evaluation corpus should be noted:

length of the corpus (in terms of elapsed time);
number of different speakers;
number of scenarios per user, number of identical scenarios processed by different users;
length of each scenario or average length;
number of dialogues, utterances, words, etc.;
number of words per utterance (average), etc.;
type of environment in which it has been recorded (and how far is it from the target usage conditions?).

Overall system characterisation

The overall system in which the dialogue system is embedded also needs to be characterised. First, though, it is necessary to clarify some terms.

A mode refers to perception senses which allow for communication: the following modes may be identified: vocal, visual, auditive, tactile, olfactive.

Communication means (or media) refer to materials or devices which are used by the dialogue system to communicate with the user.

Communication modalities concern the way the communicating agent/party uses a mode: for speech, different modalities may be identified, for example whether continuous speech or isolated words are used, whether a whispering or shouting style is used, etc.

The system may comprise different communication means which may be characterised by:

number of different media;
media usage supported (in parallel, combined, alternate, etc.).

Each medium has an associated language model and is characterised by:

medium information processing time;
availability;
input/output modalities: for example, for a recogniser , output to the system might be words or sentences; for a synthesiser, input from the system might be sequences of phonemes or conceptual graphs.

This characterisation is particularly important in multimodal dialogues, as the system's awareness of the states of each of the media (active, available, occupied, etc.), at each step of the dialogue, is determinant in the system predicting which media will be used by the user, or choosing the adequate media to send information to the user.

Next: Assessment framework Up: Evaluation Previous: Background

EAGLES SWLG SoftEdition, May 1997. Get the book...