Are Existing Monocular Computer Vision-Based 3D Motion Capture Approaches Ready for Deployment? A Methodological Study on the Example of Alpine Skiing

Mirela Ostrek; Helge Rhodin; Pascal Fua; Erich Müller; Jörg Spörri

doi:10.3390/s19194323

Are Existing Monocular Computer Vision-Based 3D Motion Capture Approaches Ready for Deployment? A Methodological Study on the Example of Alpine Skiing

Sensors (Basel). 2019 Oct 6;19(19):4323. doi: 10.3390/s19194323.

Authors

Mirela Ostrek^{1

2}, Helge Rhodin^{3

4}, Pascal Fua⁵, Erich Müller⁶, Jörg Spörri^{7

8}

Affiliations

¹ Computer Vision Laboratory, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland. mirela.ostrek@gmail.com.
² Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia. mirela.ostrek@gmail.com.
³ Computer Vision Laboratory, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland. pascal.fua@epfl.ch.
⁴ Department of Computer Science, University of British Columbia, Vancouver, V6T 1Z4, Canada. pascal.fua@epfl.ch.
⁵ Computer Vision Laboratory, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland. helge.rhodin@ubc.ca.
⁶ Department of Sport Science and Kinesiology, University of Salzburg, 5400 Hallein-Rif, Austria. erich.mueller@sbg.ac.at.
⁷ Department of Sport Science and Kinesiology, University of Salzburg, 5400 Hallein-Rif, Austria. joerg.spoerri@balgrist.ch.
⁸ Department of Orthopedics, Balgrist University Hospital, Zurich, University of Zurich, 8008 Zurich , Switzerland. joerg.spoerri@balgrist.ch.

Abstract

In this study, we compared a monocular computer vision (MCV)-based approach with the golden standard for collecting kinematic data on ski tracks (i.e., video-based stereophotogrammetry) and assessed its deployment readiness for answering applied research questions in the context of alpine skiing. The investigated MCV-based approach predicted the three-dimensional human pose and ski orientation based on the image data from a single camera. The data set used for training and testing the underlying deep nets originated from a field experiment with six competitive alpine skiers. The normalized mean per joint position error of the MVC-based approach was found to be 0.08 ± 0.01m. Knee flexion showed an accuracy and precision (in parenthesis) of 0.4 ± 7.1° (7.2 ± 1.5°) for the outside leg, and -0.2 ± 5.0° (6.7 ± 1.1°) for the inside leg. For hip flexion, the corresponding values were -0.4 ± 6.1° (4.4° ± 1.5°) and -0.7 ± 4.7° (3.7 ± 1.0°), respectively. The accuracy and precision of skiing-related metrics were revealed to be 0.03 ± 0.01 m (0.01 ± 0.00 m) for relative center of mass position, -0.1 ± 3.8° (3.4 ± 0.9) for lean angle, 0.01 ± 0.03 m (0.02 ± 0.01 m) for center of mass to outside ankle distance, 0.01 ± 0.05 m (0.03 ± 0.01 m) for fore/aft position, and 0.00 ± 0.01 m² (0.01 ± 0.00 m²) for drag area. Such magnitudes can be considered acceptable for detecting relevant differences in the context of alpine skiing.

Keywords: alpine ski racing; biomechanics; human pose estimation; markerless tracking; technical validation; video-based 3D kinematics.