Vocal markers of autism: Assessing the generalizability of machine learning models

Astrid Rybner; Emil Trenckner Jessen; Marie Damsgaard Mortensen; Stine Nyhus Larsen; Ruth Grossman; Niels Bilenberg; Cathriona Cantio; Jens Richardt Møllegaard Jepsen; Ethan Weed; Arndis Simonsen; Riccardo Fusaroli

doi:10.1002/aur.2721

Vocal markers of autism: Assessing the generalizability of machine learning models

Autism Res. 2022 Jun;15(6):1018-1030. doi: 10.1002/aur.2721. Epub 2022 Apr 6.

Authors

Astrid Rybner¹, Emil Trenckner Jessen¹, Marie Damsgaard Mortensen¹, Stine Nyhus Larsen¹, Ruth Grossman², Niels Bilenberg³, Cathriona Cantio^{3

4}, Jens Richardt Møllegaard Jepsen^{5

6}, Ethan Weed^{1

7}, Arndis Simonsen^{7

8}, Riccardo Fusaroli^{1

7

9}

Affiliations

¹ Linguistics, Cognitive Science and Semiotics, School of Communication and Culture, Aarhus University, Aarhus, Denmark.
² Communication Sciences and Disorders, Emerson College, Boston, Massachusetts, USA.
³ Child and Youth Psychiatry, University of Southern Denmark, Odense, Denmark.
⁴ Psychology, University of Southern Denmark, Odense, Denmark.
⁵ Child and Adolescent Mental Health Centre, Mental Health Services in the Capital Region of Denmark, Copenhagen, Denmark.
⁶ Center for Neuropsychiatric Schizophrenia Research and Center for Clinical Intervention and Neuropsychiatric Schizophrenia Research, Mental Health Services in the Capital Region of Denmark, Copenhagen, Denmark.
⁷ Interacting Minds Center, School of Culture and Society, Aarhus University, Aarhus, Denmark.
⁸ Psychosis Research Unit, Aarhus University Hospital, Aarhus, Denmark.
⁹ Linguistic Data Consortium, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

PMID: 35385224
DOI: 10.1002/aur.2721

Abstract

Machine learning (ML) approaches show increasing promise in their ability to identify vocal markers of autism. Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected, for example, using a different speech task or in a different language. In this paper, we systematically assess the generalizability of ML findings across a variety of contexts. We train promising published ML models of vocal markers of autism on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on (i) different participants from the same study, performing the same task; (ii) the same participants, performing a different (but similar) task; (iii) a different study with participants speaking a different language, performing the same type of task. While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to different, though similar, tasks and not at all to new languages. The ML pipeline is openly shared. Generalizability of ML models of vocal markers of autism is an issue. We outline three recommendations for strategies researchers could take to be more explicit about generalizability and improve it in future studies. LAY SUMMARY: Machine learning approaches promise to be able to identify autism from voice only. These models underestimate how diverse the contexts in which we speak are, how diverse the languages used are and how diverse autistic voices are. Machine learning approaches need to be more careful in defining their limits and generalizability.

Keywords: autism spectrum disorder; biobehavioral markers; generalizability; machine learning; voice.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Autism Spectrum Disorder*
Autistic Disorder* / diagnosis
Biomarkers
Humans
Machine Learning
Speech
Voice*

Substances

Biomarkers