Vocal development in a large-scale crosslinguistic corpus

Margaret Cychosz; Alejandrina Cristia; Elika Bergelson; Marisa Casillas; Gladys Baudet; Anne S Warlaumont; Camila Scaff; Lisa Yankowitz; Amanda Seidl

doi:10.1111/desc.13090

Vocal development in a large-scale crosslinguistic corpus

Dev Sci. 2021 Sep;24(5):e13090. doi: 10.1111/desc.13090. Epub 2021 Apr 6.

Authors

Margaret Cychosz¹, Alejandrina Cristia², Elika Bergelson³, Marisa Casillas⁴, Gladys Baudet³, Anne S Warlaumont⁵, Camila Scaff^{2

6}, Lisa Yankowitz⁷, Amanda Seidl⁸

Affiliations

¹ Department of Hearing and Speech Sciences & Center for Comparative and Evolutionary Biology of Hearing, University of Maryland, College Park, MD, USA.
² Laboratoire de Sciences Cognitives et de Psycholinguistique, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, Paris, France.
³ Department of Psychology & Neuroscience, Center for Cognitive Neuroscience, Duke University, Durham, NC, USA.
⁴ Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
⁵ Department of Communication, University of California, Los Angeles, Los Angeles, CA, USA.
⁶ Human Ecology Group, Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland.
⁷ Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.
⁸ Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, USA.

Abstract

This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio recordings of 49 children (1-36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., "ba" vs. "ee"). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Furthermore, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter-annotator reliability on the crowdsourcing platform, relative to more traditional in-lab expert annotators, means that a larger number of unique annotators and/or annotations are required, and that crowdsourcing may not be a suitable method for more fine-grained annotation decisions. Audio clips used for this project are compiled into a large-scale infant vocalization corpus that is available for other researchers to use in future work.

Keywords: babbling; crosslinguistic; crowdsourcing; infants; naturalistic recording; speech; vocal development.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Child
Humans
Infant
Language Development*
Language*
Reproducibility of Results

Grants and funding

DP5 OD019812/OD/NIH HHS/United States