Understanding Multimodal User Gesture and Speech Behavior for Object Manipulation in Augmented Reality Using Elicitation

IEEE Trans Vis Comput Graph. 2020 Dec;26(12):3479-3489. doi: 10.1109/TVCG.2020.3023566. Epub 2020 Nov 10.

Abstract

The primary objective of this research is to understand how users manipulate virtual objects in augmented reality using multimodal interaction (gesture and speech) and unimodal interaction (gesture). Through this understanding, natural-feeling interactions can be designed for this technology. These findings are derived from an elicitation study employing Wizard of Oz design aimed at developing user-defined multimodal interaction sets for building tasks in 3D environments using optical see-through augmented reality headsets. The modalities tested are gesture and speech combined, gesture only, and speech only. The study was conducted with 24 participants. The canonical referents for translation, rotation, and scale were used along with some abstract referents (create, destroy, and select). A consensus set of gestures for interactions is provided. Findings include the types of gestures performed, the timing between co-occurring gestures and speech (130 milliseconds), perceived workload by modality (using NASA TLX), and design guidelines arising from this study. Multimodal interaction, in particular gesture and speech interactions for augmented reality headsets, are essential as this technology becomes the future of interactive computing. It is possible that in the near future, augmented reality glasses will become pervasive.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Augmented Reality*
  • Behavior / classification*
  • Computer Graphics*
  • Female
  • Gestures*
  • Humans
  • Male
  • Speech*
  • User-Computer Interface
  • Virtual Reality
  • Young Adult