Using Computer Vision to Annotate Video-Recoded Direct Observation of Physical Behavior

Sensors (Basel). 2024 Apr 8;24(7):2359. doi: 10.3390/s24072359.

Abstract

Direct observation is a ground-truth measure for physical behavior, but the high cost limits widespread use. The purpose of this study was to develop and test machine learning methods to recognize aspects of physical behavior and location from videos of human movement: Adults (N = 26, aged 18-59 y) were recorded in their natural environment for two, 2- to 3-h sessions. Trained research assistants annotated videos using commercially available software including the following taxonomies: (1) sedentary versus non-sedentary (two classes); (2) activity type (four classes: sedentary, walking, running, and mixed movement); and (3) activity intensity (four classes: sedentary, light, moderate, and vigorous). Four machine learning approaches were trained and evaluated for each taxonomy. Models were trained on 80% of the videos, validated on 10%, and final accuracy is reported on the remaining 10% of the videos not used in training. Overall accuracy was as follows: 87.4% for Taxonomy 1, 63.1% for Taxonomy 2, and 68.6% for Taxonomy 3. This study shows it is possible to use computer vision to annotate aspects of physical behavior, speeding up the time and reducing labor required for direct observation. Future research should test these machine learning models on larger, independent datasets and take advantage of analysis of video fragments, rather than individual still images.

Keywords: assessment; computer vision; direct observation; physical activity; sedentary behavior.

MeSH terms

  • Adult
  • Computers*
  • Environment
  • Female
  • Humans
  • Labor, Obstetric*
  • Machine Learning
  • Pregnancy
  • Software