Next: Assessment framework
Up: Evaluation
Previous: Background
Characterisation is a very important part of the system evaluation
process. It is vital to define precisely what it is that is being
evaluated, together with all the conditions under which the evaluation
takes place. We therefore devote attention here to characterising the
dialogue system, the task, the user, the environment , the corpus, and
the overall system.
Dialogue systems may be characterised by several parameters, defining
their complexity; these are listed in appropriate categories below.
- Language model:
- the model used by the recogniser and shared by
the system in order to guide the recognition process, whenever
necessary.
- Recogniser complexity:
- isolated word,
word spotting, continuous speech (error-free
read text or spontaneous with repairs, hesitations, ill-formed or incomplete
sentences).
- Lexicon:
- the list of allowed words.
- Phonological rules:
- rules of pronunciation of the words.
- Syntactic rules:
- descriptions of the well-formed linguistic
constructions recognised by the system.
- Semantic-pragmatic representation :
- the list of concepts used with
associated structures (frames, conceptual graphs, etc.).
- Task model:
- plans and scenarios , goals and subgoals, representation of
the objects of the task and of their characteristics. (For example, in
an air-traffic control domain, known objects would include planes with
parameters such as heading, level, etc., and known goals would include
authorising route adjustments, etc.).
- Dialogue grammar :
- hierarchy of subdialogues
with dialogue act leaf nodes , indicating permitted progression
from one turn type to the next.
- User model:
- particular rules of pronunciation
(confusion matrix ),
linguistic behaviour (particular formulations), user beliefs and
knowledge about the task, etc.
- System model:
- the list of the media available, with
a description of their characteristics.
The following are alternative classes of dialogue strategy which may be
adopted. Considerable scope exists for further subclassification here.
- Strictly guided and deterministic :
- no initiative left to the
user. IVR systems typically fall into this category.
- Cooperative:
- includes correction and prediction mechanisms,
shares initiative with the user, accepts interruptions or negotiation,
capable of clarifying the system's choices and responses, (turn -taking
is balanced between the user and the system)
- Constitutive:
- (for educational systems) the system has to learn
new notions in its normal operation.
- Adaptive:
- takes into account the dynamic user model by learning the
users communicative strategies and adjusting to them as each dialogue
proceeds.
There is an intimate connection between the application domain and
associated tasks, the language required to accomplish these tasks in
this domain, and the design of a system which supports dialogues for
this purpose.
A broad categorisation of task types can be made, depending on
whether the objects of the task are evolving during the dialogue or
not. These include the following:
- Information access and retrieval:
- for example, train or
flight time table enquiries.
- Negotiation:
- the system acts as an expert-system, trying to find the
best solution, for instance the best way to assign conference delegates
to hotels, taking hotel costs and proximity to the conference centre
into account. (An information retrieval system may need some kind of
negotiation, for example, to obtain a less expensive travel ticket).
- Process control:
- the task is evolving, as for instance in
communication with a robot.
- Training:
- knowledge acquisition by the user or by the machine.
In such cases as air-traffic control training , the task may
be evolving (planes are changing heading or level).
- Monitoring:
- the system does not play an active part in the
dialogue, but monitors its progress and is available to offer
assistance when called upon. Such systems are sometimes referred to as
computer mediated (or supported or assisted) human-human communication
systems. The most notable example of such a system is the VERBMOBIL
face-to-face spoken language translation system,
currently under
development.
Tasks and the dialogues by which these tasks are achieved are more or
less mutually dependent. Typically, simple tasks will be solved by means
of simple dialogues and complex tasks will be accomplished by means of
complex dialogues. Thus an important part of the characterisation of
the dialogue system is an index of the complexity of the task or tasks
to be addressed by the system. Such indices might include the following
subcomponents.
- the number of different scenarios covered (i.e. does the system
address just one kind of problem or many different kinds?);
- the maximum complexity (i.e. in each scenario , what is the maximum
number of subgoals which have to be satisfied in order to solve the task
problem?). The complexity may be measured by the depth or width of the
hierarchy, if the task and subtasks can be represented by a tree
structure;
- number of subtasks to be achieved in parallel (especially in
multimodal interaction);
- the minimum number of exchanges necessary to solve the task problem or
complete a plan.
A large number of different criteria must be taken into account when
characterising users of interactive dialogue systems. At least the
following must be considered:
- number of users, for example, a few (10), or numerous (thousands);
- age (children, adults). The following age
bands are adequate for most assessment purposes: less than 18, 18-25, 25-35,
35-45, 45-55, 55-65, over 65;
- sex (female/male);
- experience in the use of the automatic system (trained or untrained,
experienced or novice, occasional or regular users);
- expertise in the application domain (the
user knows what information he wants or not);
- status (professionals or members of the general public );
- motivation (whether they are real end-users, or paid or unpaid
subjects);
- physical status (stressed , tired, ill, ...);
when stressed due to
adverse environments (for instance, in a space shuttle), the user might
be affected by vibrations, temperature, G-effect , urgency,
etc. in his
pronunciation and utterance structure, and in his way of conducting the
dialogue (he may wish to complete the task very quickly, for
instance).
The environment is the total context in which an interactive dialogue
system is evaluated.
In general, it would be impossible to produce an exhaustive
description of an environment , but a restricted set of relevant features
may be selected usefully. For example, the following features relating
to the acoustic environment might be used.
- type and proximity of the microphone used by the recogniser
(for example, high quality microphone , close-talking microphone,
microphone array , telephone handset, hands-free telephone, etc.);
- level of background noise (anechoic
chamber/ office/ street/ car/ factory);
- communication quality (telephone lines: analog/digital).
Other relevant environment features will also have to be developed. As
a general rule, the more features of the environment which can be
characterised, the better.
A system evaluation will result in the collection of a corpus of
resulting dialogues.
Corpora must be fully characterised to ensure that changes in
system performance over time can be tracked, and that corpora
collected using different systems can be reliably compared.
At least the following features of an evaluation corpus should be noted:
- length of the corpus (in terms of elapsed time);
- number of different speakers;
- number of scenarios per user, number of identical scenarios
processed by different users;
- length of each scenario or average length;
- number of dialogues, utterances, words, etc.;
- number of words per utterance (average), etc.;
- type of environment in which it has been recorded (and how far
is it from the target usage conditions?).
The overall system in which the dialogue system is embedded also needs to
be characterised. First, though, it is necessary to clarify some terms.
A mode refers to perception senses which allow for
communication: the following modes may be identified: vocal, visual, auditive,
tactile, olfactive.
Communication means
(or media) refer to materials or
devices which are used by the dialogue system to communicate with the user.
Communication modalities concern the way
the communicating agent/party uses a mode:
for speech, different modalities may be identified, for example whether
continuous speech or isolated words are used, whether a whispering or shouting
style is used, etc.
The system may comprise different communication means which may be
characterised by:
- number of different media;
- media usage supported (in parallel, combined, alternate, etc.).
Each medium has an associated language model and is characterised by:
- medium information processing time;
- availability;
- input/output modalities: for example, for a recogniser , output to the system
might be words or sentences; for a synthesiser, input from the
system might be sequences of phonemes or conceptual graphs.
This characterisation is particularly important in multimodal dialogues,
as the system's awareness of the states of each of the media (active,
available, occupied, etc.), at each step of the dialogue, is determinant
in the system predicting which media will be used by the user, or
choosing the adequate media to send information to the user.
Next: Assessment framework
Up: Evaluation
Previous: Background
EAGLES SWLG SoftEdition, May 1997. Get the book...