Automatic recognition of second language speech-in-noise

Seung-Eun Kim; Bronya R Chernyak; Olga Seleznova; Joseph Keshet; Matthew Goldrick; Ann R Bradlow

doi:10.1121/10.0024877

Automatic recognition of second language speech-in-noise

JASA Express Lett. 2024 Feb 1;4(2):025204. doi: 10.1121/10.0024877.

Authors

Seung-Eun Kim¹, Bronya R Chernyak², Olga Seleznova², Joseph Keshet², Matthew Goldrick¹, Ann R Bradlow¹

Affiliations

¹ Department of Linguistics, Northwestern University, Evanston, Illinois 60208, USA.
² Faculty of Electrical & Computer Engineering, Technion-Israel Institute of Technology, Haifa 3200003, Israelseungeun.kim@northwestern.edu, chernroni@gmail.com, olga.s@technion.ac.il, jkeshet@technion.ac.il, matt-goldrick@northwestern.edu, abradlow@northwestern.edu.

PMID: 38350077
DOI: 10.1121/10.0024877

Abstract

Measuring how well human listeners recognize speech under varying environmental conditions (speech intelligibility) is a challenge for theoretical, technological, and clinical approaches to speech communication. The current gold standard-human transcription-is time- and resource-intensive. Recent advances in automatic speech recognition (ASR) systems raise the possibility of automating intelligibility measurement. This study tested 4 state-of-the-art ASR systems with second language speech-in-noise and found that one, whisper, performed at or above human listener accuracy. However, the content of whisper's responses diverged substantially from human responses, especially at lower signal-to-noise ratios, suggesting both opportunities and limitations for ASR--based speech intelligibility modeling.

MeSH terms

Humans
Noise / adverse effects
Recognition, Psychology
Speech Intelligibility / physiology
Speech Perception* / physiology
Speech Recognition Software