Action recognition based on multimode fusion for VR online platform

Xuan Li; Hengxin Chen; Shengdong He; Xinrun Chen; Shuang Dong; Ping Yan; Bin Fang

doi:10.1007/s10055-023-00773-4

Action recognition based on multimode fusion for VR online platform

Virtual Real. 2023 Feb 24:1-16. doi: 10.1007/s10055-023-00773-4. Online ahead of print.

Authors

Xuan Li¹, Hengxin Chen¹, Shengdong He¹, Xinrun Chen¹, Shuang Dong¹, Ping Yan¹, Bin Fang¹

Affiliation

¹ College of Computer Science, Chongqing University, Chongqing, 400044 China.

Abstract

The current popular online communication platforms can convey information only in the form of text, voice, pictures, and other electronic means. The richness and reliability of information is not comparable to traditional face-to-face communication. The use of virtual reality (VR) technology for online communication is a viable alternative to face-to-face communication. In the current VR online communication platform, users are in a virtual world in the form of avatars, which can achieve "face-to-face" communication to a certain extent. However, the actions of the avatar do not follow the user, which makes the communication process less realistic. Decision-makers need to make decisions based on the behavior of VR users, but there are no effective methods for action data collection in VR environments. In our work, three modalities of nine actions from VR users are collected using a virtual reality head-mounted display (VR HMD) built-in sensors, RGB cameras and human pose estimation. Using these data and advanced multimodal fusion action recognition networks, we obtained a high accuracy action recognition model. In addition, we take advantage of the VR HMD to collect 3D position data and design a 2D key point augmentation scheme for VR users. Using the augmented 2D key point data and VR HMD sensor data, we can train action recognition models with high accuracy and strong stability. In data collection and experimental work, we focus our research on classroom scenes, and the results can be extended to other scenes.

Keywords: Action recognition; Data augmentation; Remote education; Virtual reality online platform.

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.