Towards efficient human-machine collaboration: effects of gaze-driven feedback and engagement on performance

Nikolina Mitev; Patrick Renner; Thies Pfeiffer; Maria Staudte

doi:10.1186/s41235-018-0148-x

Towards efficient human-machine collaboration: effects of gaze-driven feedback and engagement on performance

Cogn Res Princ Implic. 2018 Dec 29;3(1):51. doi: 10.1186/s41235-018-0148-x.

Authors

Nikolina Mitev¹, Patrick Renner², Thies Pfeiffer², Maria Staudte³

Affiliations

¹ CITEC, Universität des Saarlandes, Campus C7.4 (2.04), Saarbrücken, 66123, Germany. nikkol@coli.uni-saarland.de.
² CITEC, Bielefeld University, Inspiration 1, Bielefeld, 33619, Germany.
³ CITEC, Universität des Saarlandes, Campus C7.4 (2.04), Saarbrücken, 66123, Germany.

Abstract

Referential success is crucial for collaborative task-solving in shared environments. In face-to-face interactions, humans, therefore, exploit speech, gesture, and gaze to identify a specific object. We investigate if and how the gaze behavior of a human interaction partner can be used by a gaze-aware assistance system to improve referential success. Specifically, our system describes objects in the real world to a human listener using on-the-fly speech generation. It continuously interprets listener gaze and implements alternative strategies to react to this implicit feedback. We used this system to investigate an optimal strategy for task performance: providing an unambiguous, longer instruction right from the beginning, or starting with a shorter, yet ambiguous instruction. Further, the system provides gaze-driven feedback, which could be either underspecified ("No, not that one!") or contrastive ("Further left!"). As expected, our results show that ambiguous instructions followed by underspecified feedback are not beneficial for task performance, whereas contrastive feedback results in faster interactions. Interestingly, this approach even outperforms unambiguous instructions (manipulation between subjects). However, when the system alternates between underspecified and contrastive feedback to initially ambiguous descriptions in an interleaved manner (within subjects), task performance is similar for both approaches. This suggests that listeners engage more intensely with the system when they can expect it to be cooperative. This, rather than the actual informativity of the spoken feedback, may determine the efficiency of information uptake and performance.

Keywords: Human–computer interaction; Listener gaze; Multimodal systems; Natural language generation; Referential success.

Grants and funding

EXC 284-2/Multimodal Computing and Interaction Cluster of Excellence at Saarland University