Using 2D video-based pose estimation for automated prediction of autism spectrum disorders in young children

Sci Rep. 2021 Jul 23;11(1):15069. doi: 10.1038/s41598-021-94378-z.

Abstract

Clinical research in autism has recently witnessed promising digital phenotyping results, mainly focused on single feature extraction, such as gaze, head turn on name-calling or visual tracking of the moving object. The main drawback of these studies is the focus on relatively isolated behaviors elicited by largely controlled prompts. We recognize that while the diagnosis process understands the indexing of the specific behaviors, ASD also comes with broad impairments that often transcend single behavioral acts. For instance, the atypical nonverbal behaviors manifest through global patterns of atypical postures and movements, fewer gestures used and often decoupled from visual contact, facial affect, speech. Here, we tested the hypothesis that a deep neural network trained on the non-verbal aspects of social interaction can effectively differentiate between children with ASD and their typically developing peers. Our model achieves an accuracy of 80.9% (F1 score: 0.818; precision: 0.784; recall: 0.854) with the prediction probability positively correlated to the overall level of symptoms of autism in social affect and repetitive and restricted behaviors domain. Provided the non-invasive and affordable nature of computer vision, our approach carries reasonable promises that a reliable machine-learning-based ASD screening may become a reality not too far in the future.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Autism Spectrum Disorder / diagnosis*
  • Autism Spectrum Disorder / diagnostic imaging
  • Autism Spectrum Disorder / physiopathology
  • Child
  • Child, Preschool
  • Comprehension / physiology
  • Eye-Tracking Technology*
  • Female
  • Humans
  • Infant
  • Male
  • Social Behavior
  • Video Recording / methods*