Describing Vocalizations in Young Children: A Big Data Approach Through Citizen Science Annotation

J Speech Lang Hear Res. 2021 Jul 16;64(7):2401-2416. doi: 10.1044/2021_JSLHR-20-00661. Epub 2021 Jun 7.

Abstract

Purpose Recording young children's vocalizations through wearables is a promising method to assess language development. However, accurately and rapidly annotating these files remains challenging. Online crowdsourcing with the collaboration of citizen scientists could be a feasible solution. In this article, we assess the extent to which citizen scientists' annotations align with those gathered in the lab for recordings collected from young children. Method Segments identified by Language ENvironment Analysis as produced by the key child were extracted from one daylong recording for each of 20 participants: 10 low-risk control children and 10 children diagnosed with Angelman syndrome, a neurogenetic syndrome characterized by severe language impairments. Speech samples were annotated by trained annotators in the laboratory as well as by citizen scientists on Zooniverse. All annotators assigned one of five labels to each sample: Canonical, Noncanonical, Crying, Laughing, and Junk. This allowed the derivation of two child-level vocalization metrics: the Linguistic Proportion and the Canonical Proportion. Results At the segment level, Zooniverse classifications had moderate precision and recall. More importantly, the Linguistic Proportion and the Canonical Proportion derived from Zooniverse annotations were highly correlated with those derived from laboratory annotations. Conclusions Annotations obtained through a citizen science platform can help us overcome challenges posed by the process of annotating daylong speech recordings. Particularly when used in composites or derived metrics, such annotations can be used to investigate early markers of language delays.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Big Data
  • Child, Preschool
  • Citizen Science*
  • Humans
  • Language Development
  • Language Development Disorders* / diagnosis
  • Speech