ARGUS: Visualization of AI-Assisted Task Guidance in AR

Sonia Castelo; Joao Rulff; Erin McGowan; Bea Steers; Guande Wu; Shaoyu Chen; Iran Roman; Roque Lopez; Ethan Brewer; Chen Zhao; Jing Qian; Kyunghyun Cho; He He; Qi Sun; Huy Vo; Juan Bello; Michael Krone; Claudio Silva

doi:10.1109/TVCG.2023.3327396

ARGUS: Visualization of AI-Assisted Task Guidance in AR

IEEE Trans Vis Comput Graph. 2023 Nov 2:PP. doi: 10.1109/TVCG.2023.3327396. Online ahead of print.

Authors

Sonia Castelo, Joao Rulff, Erin McGowan, Bea Steers, Guande Wu, Shaoyu Chen, Iran Roman, Roque Lopez, Ethan Brewer, Chen Zhao, Jing Qian, Kyunghyun Cho, He He, Qi Sun, Huy Vo, Juan Bello, Michael Krone, Claudio Silva

PMID: 37917526
DOI: 10.1109/TVCG.2023.3327396

Abstract

The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive the 3D environment, reason about physical tasks, and model the performer, all in real-time. Within this framework, a wide variety of sensors are needed to generate data across different modalities, such as audio, video, depth, speech, and time-of-flight. The required sensors are typically part of the AR headset, providing performer sensing and interaction through visual, audio, and haptic feedback. AI assistants not only record the performer as they perform activities, but also require machine learning (ML) models to understand and assist the performer as they interact with the physical world. Therefore, developing such assistants is a challenging task. We propose ARGUS, a visual analytics system to support the development of intelligent AR assistants. Our system was designed as part of a multi-year-long collaboration between visualization researchers and ML and AR experts. This co-design process has led to advances in the visualization of ML in AR. Our system allows for online visualization of object, action, and step detection as well as offline analysis of previously recorded AR sessions. It visualizes not only the multimodal sensor data streams but also the output of the ML models. This allows developers to gain insights into the performer activities as well as the ML models, helping them troubleshoot, improve, and fine-tune the components of the AR assistant.