End-to-End Model-Based Detection of Infants with Autism Spectrum Disorder Using a Pretrained Model

Jung Hyuk Lee; Geon Woo Lee; Guiyoung Bong; Hee Jeong Yoo; Hong Kook Kim

doi:10.3390/s23010202

End-to-End Model-Based Detection of Infants with Autism Spectrum Disorder Using a Pretrained Model

Sensors (Basel). 2022 Dec 25;23(1):202. doi: 10.3390/s23010202.

Authors

Jung Hyuk Lee¹, Geon Woo Lee², Guiyoung Bong³, Hee Jeong Yoo^{3

4}, Hong Kook Kim^{1

2}

Affiliations

¹ School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea.
² AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea.
³ Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam 13620, Republic of Korea.
⁴ College of Medicine, Seoul National University, Seoul 03980, Republic of Korea.

Abstract

In this paper, we propose an end-to-end (E2E) neural network model to detect autism spectrum disorder (ASD) from children's voices without explicitly extracting the deterministic features. In order to obtain the decisions for discriminating between the voices of children with ASD and those with typical development (TD), we combined two different feature-extraction models and a bidirectional long short-term memory (BLSTM)-based classifier to obtain the ASD/TD classification in the form of probability. We realized one of the feature extractors as the bottleneck feature from an autoencoder using the extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) input. The other feature extractor is the context vector from a pretrained wav2vec2.0-based model directly applied to the waveform input. In addition, we optimized the E2E models in two different ways: (1) fine-tuning and (2) joint optimization. To evaluate the performance of the proposed E2E models, we prepared two datasets from video recordings of ASD diagnoses collected between 2016 and 2018 at Seoul National University Bundang Hospital (SNUBH), and between 2019 and 2021 at a Living Lab. According to the experimental results, the proposed wav2vec2.0-based E2E model with joint optimization achieved significant improvements in the accuracy and unweighted average recall, from 64.74% to 71.66% and from 65.04% to 70.81%, respectively, compared with a conventional model using autoencoder-based BLSTM and the deterministic features of the eGeMAPS.

Keywords: autism spectrum disorder; autoencoder; bidirectional long short-term memory (BLSTM); end-to-end neural network; joint optimization; pretrained model.

MeSH terms

Autism Spectrum Disorder* / diagnosis
Child
Humans
Infant
Memory, Long-Term
Video Recording / methods

Grants and funding

2019-0-00330/Institute for Information and Communications Technology Promotion