Dialogue partners show a strong ability to synchronize their utterances. Models of rhythmic entrainment can be useful in explaining the temporal synchronizations across utterances, that is, in the anticipation of temporal windows for backchannel utterances or turn initializations. In this project, we aim to identify the rhythmic-prosodic cues used by the listener to generate hypotheses concerning the timing of (potential) upcoming dialogue contributions. The goal of the project is to build a model that is both descriptively adequate, in line with cognitively plausible models of rhythmic entrainment, and can be integrated into an artificial agent.
The exact timing of initializing utterances in dialogue is crucial for natural interaction. Previous analyses have shown the impact of rhythmical structure enabling dialogue partners to generate precise timing hypotheses. Dialogue partners show a strong ability to synchronise their speech utterances, a phenomenon that has often been called entrainment. Such models of entrainment are potentially useful in explaining the temporal synchronisations across utterances, i.e. it can be helpful in the anticipation of turn ends or temporal windows for backchannels. It is also well-known that listeners are guided much by their rhythmical expectations when processing speech or multimodal utterances.
In this new project, we aim to identify the rhythmic-prosodic cues used by the listener to generate hypotheses concerning the timing of (potential) upcoming dialogue contributions. The main goal of the project is to build a model of timing in dialogue that is both descriptively adequate, in line with cognitively plausible models of rhythmical entrainment, and that can be implemented in an artificial agent. Temporal models of dialogue timing often fail to take into account the dynamic perspective of the rhythmical properties of speech, i.e. the fact that within an utterance, speakers may accelerate, decelerate or change the broad rhythmical pattern of their utterance. Dynamic models of temporal entrainment provide an explanatory basis for a dynamic perspective on dialogue timing. These ideas seem a promising starting point to improve a previously developed computational model for turn taking in a dynamic interaction loop. We will collect and analyse semi-spontaneous speech data and implement these in a dynamic model based on adaptive oscillators and integrated in an artificial agent for the purpose of model evaluation.