Using Computer Vision to Annotate Video-Recoded Direct Observation of Physical Behavior

Sarah K Keadle; Skylar Eglowski; Katie Ylarregui; Scott J Strath; Julian Martinez; Alex Dekhtyar; Vadim Kagan

doi:10.3390/s24072359

Using Computer Vision to Annotate Video-Recoded Direct Observation of Physical Behavior

Sensors (Basel). 2024 Apr 8;24(7):2359. doi: 10.3390/s24072359.

Authors

Sarah K Keadle¹, Skylar Eglowski², Katie Ylarregui¹, Scott J Strath³, Julian Martinez³, Alex Dekhtyar⁴, Vadim Kagan²

Affiliations

¹ Department of Kinesiology and Public Health, California Polytechnic State University, San Luis Obispo, CA 93407, USA.
² Sentimetrix Inc., Bethesda, MD 20814, USA.
³ College of Public Health, University of Wisconsin, Milwaukee, WI 53205, USA.
⁴ Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo, CA 93407, USA.

Abstract

Direct observation is a ground-truth measure for physical behavior, but the high cost limits widespread use. The purpose of this study was to develop and test machine learning methods to recognize aspects of physical behavior and location from videos of human movement: Adults (N = 26, aged 18-59 y) were recorded in their natural environment for two, 2- to 3-h sessions. Trained research assistants annotated videos using commercially available software including the following taxonomies: (1) sedentary versus non-sedentary (two classes); (2) activity type (four classes: sedentary, walking, running, and mixed movement); and (3) activity intensity (four classes: sedentary, light, moderate, and vigorous). Four machine learning approaches were trained and evaluated for each taxonomy. Models were trained on 80% of the videos, validated on 10%, and final accuracy is reported on the remaining 10% of the videos not used in training. Overall accuracy was as follows: 87.4% for Taxonomy 1, 63.1% for Taxonomy 2, and 68.6% for Taxonomy 3. This study shows it is possible to use computer vision to annotate aspects of physical behavior, speeding up the time and reducing labor required for direct observation. Future research should test these machine learning models on larger, independent datasets and take advantage of analysis of video fragments, rather than individual still images.

Keywords: assessment; computer vision; direct observation; physical activity; sedentary behavior.

MeSH terms

Adult
Computers*
Environment
Female
Humans
Labor, Obstetric*
Machine Learning
Pregnancy
Software

Grants and funding

75N91023C00031/CA/NCI NIH HHS/United States