Speaker gaze effects on language comprehension (SGELC)

 

Helene Kreysa, PhD


Cognitive Interaction Technology Excellence Cluster (CITEC)

University of Bielefeld


Room H1-132


Morgenbreede 39

33615 Bielefeld

Germany


tel.: +49 (0)521-106 12248


email:

hkreysa(AT)cit-ec.uni-bielefeld.de


Current research project with Pia Knoeferle
last updated 12 September 2011
for a brief summary, see here
Human speakers reliably look at objects they are about to refer to. This project uses eye tracking to explore how comprehenders may benefit from seeing where their interlocutor is looking, based on the idea that the speaker’s gaze provides a valid cue to upcoming speech content. 

Background
In language comprehension, a large body of research has shown that listeners can rapidly integrate the unfolding speech content with information in visual context (e.g., Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). In these studies, participants hear a sentence either while inspecting an arrangement of real-world objects in front of them, or while viewing a display of semi-realistic clipart objects and depicted actions. However, in a typical setup, the speaker of the sentence is not visible. 
Yet the shifting focus of another person’s gaze can be highly informative for listeners: When a speaker describes entities in the visual world, s/he robustly gazes at objects before mentioning them (Griffin & Bock, 2000). These pre-mention referent-directed looks of a speaker have been shown to facilitate comprehension (e.g., Hanna & Brennan, 2007; Nappa & Arnold, 2009; Nappa, Wessel, McEldoon, Gleitman, & Trueswell, 2009). In a collaborative task, Hanna and Brennan (2007) showed that seeing a human speaker attend to the object s/he was about to mention led listeners to shift attention to a corresponding object in their own workspace, even before it was referred to. Similarly, the “gaze” of a robot speaker helped participants to anticipate linguistically ambiguous referents (Staudte & Crocker, 2009). Speaker gaze thus allows listeners to anticipate what the speaker will speak about next.

Research questions 
While it seems obvious that speaker gaze can benefit listeners’ comprehension, the factors that affect it and the extent to which this generalises across different settings are the topics of our investigations: 
We have examined whether the usefulness of speaker gaze is independent of sentence difficulty and comprehension tasks. We compared speaker gaze effects on the processing of German sentences that were either easy (subject-initial) or difficult to understand (object-initial, e.g., Hemforth, 1993; Knoeferle & Crocker, 2011). If speaker gaze is beneficial across the board, we should see similar effects for both sentence structures and also for different comprehension tasks.
In previous studies, the speaker always faced the listener fully frontally, a situation which occurs for instance at a sales counter or in frontal classroom teaching (and in the example image above). Yet in many other situations – e.g., browsing a shop window with a friend or discussing a shared piece of work with a colleague – a fully frontal view of the other’s eyes is not available. Since gaze detection decreases at an angle between viewer and gazer (e.g., Cline, 1967; Gibson & Pick, 1963), it is important to establish whether the benefit of seeing the speaker actually generalises to situations where interlocutors are positioned at an angle to each other. For this reason, our stimuli are videos showing a speaker looking at a screen from a sideways perspective.
Ultimately, a better understanding of speaker gaze effects in a variety of speaker-listener settings and across different sentence types and ambiguities will help to extend existing processing accounts of situated comprehension with a speaker model (Knoeferle & Crocker, 2006; 2007), and to inform the construction of human-computer interfaces (e.g., Kopp, Jung, Leßmann, & Wachsmuth, 2003; Poggi, Pelachaud, & De Rosis, 2000). 

Experimental paradigm
In a typical version of our paradigm, participants see three Second Life® characters that are referred to in an auditorily presented German sentence, as well as (in half the trials) the speaker producing this sentence and looking at the characters. Here is an example of a still from one of the stimulus videos, including the speaker:   

Each sentence describes an action involving the central character (e.g., the waiter) and one of the two outer characters on the screen (e.g., a millionaire or a saxophone player). These sentences use either subject-verb-object structure, as in (a), or object-verb-subject (b).
(1)	Der Kellner beglückwünscht den Millionär. “the waiter (subject/ agent) is congratulating the millionaire (object/ patient)”
(2)	Den Kellner beglückwünscht der Millionär. This OVS sentence structure is not available in English: the meaning is the same as “the waiter (patient) is being congratulated by the millionaire (agent)”, but the sentence is active, not passive.
All videos begin with the speaker looking at the camera and then inspecting each character in turn. Just before producing the sentence, she fixates the central character again, who is always mentioned first. During the sentence, she shifts gaze from the first-mentioned character (e.g., Kellner in both sentence structures) to the referent of the second noun phrase, the target character (Millionär). 
In the four experimental studies conducted to date (Experiments 1-4), we measured participants’ reaction times in a range of comprehension tasks (e.g., identifying referents, judging sentence difficulty or sentence role relations). Experiments 2-4 also recorded their eye movements on the screen throughout the entire trial. The main eye movement analyses focus on the relative number of fixations to the target character between the speaker’s gaze shift and the onset of the second noun phrase, as well as on mean fixation onset times per condition.

Key findings to date
Using this paradigm, we are able to show that
1.	even viewed from an angle, speaker gaze can rapidly influence visual attention in a listener: listeners looked at the target character earlier when the speaker was visible than when she was not;
2.	effects of speaker gaze on visual attention are not independent of incremental syntactic structure building and thematic role assignment: with speaker gaze, listeners showed a stronger tendency to anticipate the target character early for “easy” subject-initial than object-initial sentences;
3.	effects of speaker gaze on visual attention are sensitive to small differences between comprehension tasks (e.g., identifying the characters vs. locating the experiencer of the action).
However, despite substantial facilitation through speaker gaze on a listener's visual attention during comprehension, the effects are short-lived and do not affect post-comprehension response times, at least in the tasks we have used so far. 


I will be adding more information here as the project proceeds, but at present the most up-to-date summaries are our 6-page papers this year at 
EuroCogSci in Sofia, May 21-24 [pdf]
CogSci in Boston, July 20-23 [pdf]

Acknowledgments
This research was funded by the Cognitive Interaction Technology Excellence Center (German research foundation, DFG). We thank Eva Mende, Linda Krull, Anne Kaestner, Lydia Diegmann, and Eva Nunnemann for their assistance with preparing the stimuli and/or collecting data.

References
Cline, M. G. (1967). The perception of where a person is looking. American Journal of Psychology, 80, 41-50.
Gibson, J. J., & Pick, A. D. (1963). Perception of another person's looking behavior. American Journal of Psychology, 76, 386-394.
Griffin, Z. M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science, 11, 274-279.
Hanna, J. E., & Brennan, S. E. (2007). Speakers' eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language, 57, 596-615. 
Hemforth, B. (1993). Kognitives Parsing: Repräsentation und Verarbeitung sprachlichen Wissens. Sankt Augustin: Infix.
Knoeferle, P. & Crocker, M.W. (2006). The coordinated interplay of scene, utterance, and world knowledge: evidence from eye tracking. Cognitive Science, 30, 481-529. 
Knoeferle, P. & Crocker, M.W. (2007). The influence of recent scene events on spoken comprehension: evidence from eye movements. Journal of Memory and Language, 57, 519-543.
Knoeferle, P., & Crocker, M. W. (2009). Constituent order and semantic parallelism in online comprehension: Eye-tracking evidence from German. The Quarterly Journal of Experimental Psychology, 62, 2338 - 2371.
Kopp, S., Jung, B., Leßmann, N., & Wachsmuth, I. (2003). Max - A multimodal assistant in virtual reality construction. Künstliche Intelligenz, 4/03, 11-17.
Nappa, R., & Arnold, J. (2009). Paying attention to intention: Effects of intention (but not egocentric attention) on pronoun resolution. Paper presented at the 22nd Annual CUNY Conference on Human Sentence Processing. 
Nappa, R., Wessel, A., McEldoon, K. L., Gleitman, L. R., & Trueswell, J. C. (2009). Use of speaker's gaze and syntax in verb learning. Language Learning and Development, 5, 203-234.
Poggi, I., Pelachaud, C., & De Rosis, F. (2000). Eye communication in a conversational 3D synthetic agent. Aicommunications, 13, 169-181.
Staudte, M., & Crocker, M. W. (2011). Investigating joint attention mechanisms through spoken human-robot interaction. Cognition, 120, 268-291.
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632-1634.http://livepage.apple.com/https://www.cit-ec.de/research/SGELChttp://secondlife.com/http://nbu.bg/cogs/eurocogsci2011/Current_research_files/EuroCogSci_hkpk_2011.pdfhttp://nbu.bg/cogs/eurocogsci2011/Current_research_files/CogSci2011_hkpk.pdfCurrent_research_files/EuroCogSci_2011_hkpk.pdfshapeimage_1_link_0shapeimage_1_link_1shapeimage_1_link_2shapeimage_1_link_3shapeimage_1_link_4shapeimage_1_link_5shapeimage_1_link_6