A requirement of dialogue systems is for metrics to measure the successful transaction of the dialogue. One aspect of this concerns turn taking between the participants. Here different ways that dialogue interaction have been measured in conversations between two humans are described.
In dyadic dialogue communication between humans, conversation among speakers is characterised by turn-taking : in general, one participant, A, talks, stops; another, B, starts, talks, stops, and so we obtain an A-B-A-B-A-B distribution of talk across two participants. This transition from one speaker to another has been shown to be orderly, with precise timing and with less than 5% overlap.
Sacks and co-workers [Sacks et al. (1974)] suggest that the mechanism that governs turn-taking , and accounts for the properties noted, is a set of rules with ordered options which operates on a turn-by-turn basis , and can thus be termed a ``local management system''. One way of looking at the participant is to see him as a sharing device operating over a scarce resource, namely control of the ``floor''. Such an allocational system will require minimum units over which it will operate. These units are, in this model, determined by various features of linguistic surface structure: they are syntactic units (sentences, clauses, noun phrases, and so on) identified as turn-units in part by prosodic means.
Other psychologists working on conversation have suggested a different solution to how turn-taking works. According to this other view, turn-taking is regulated primarily by signals, and not by opportunity assignment rules at all. [Duncan (1974)], for example, describes three basic signals for the turn-taking mechanism:
These signals are used and responded to in a relatively structured manner. On such a view, the current speaker will signal when he intends to hand over the floor, and other participants may bid by recognised signals for the right to speak.
A disadvantage of dialogue-based metrics is that (like content analysis), they require time-consuming manual analyses. It would be better, therefore, if automatic, acoustic-based procedures could be developed. A potential problem for acoustic-based metrics of dialogue interaction is that often speakers are not acoustically isolated. This need not (and in some available recordings does not) apply over telephone connections and potentially, therefore, for many dialogue interaction systems. These allow acoustic metrics of disruption to be developed which have the advantage that they are automatic.
Little work has been done on this topic. Prosodic factors are a major source of turn taking cues, and acoustic metrics associated with these factors (amplitude, pitch and duration ) have been measured [Howell (1990)].
In this example, Speaker A is interrupted (unsuccessfully) by speaker B. The terms used to describe the various components of an interruption are summarised in Figure 9.5. The ordinate represents activity (which is happening when the speech is above the baseline).
Telephone systems are particularly useful for this work as they permit acoustical isolation of the two dialogue channels preventing, to an extent, crosstalk. For this reason, interruption patterns like those described above are easily computed.
Figure 9.5: Schematic illustration of terms
used in connection with speaker interruption patterns
The types of metrics that can be computed at points of interruption are those associated with prosody - principally amplitude, pitch , segment duration and pauses. Once obtained, these data can be employed to ascertain how these factors are used during dialogue to signal that some response (either from the machine or human) is expected. If the frequency of occurence of particular pitch movements at points of interruption is to be compared across speaking tasks, then a non-parametric statistical test would be needed. If acoustic measures in the vicinity of interruptions are to be compared, a parametric statistical test would be needed.