A model of speech recognition for hearing-impaired listeners based on deep learning

Jana Roßbach; Birger Kollmeier; Bernd T Meyer

doi:10.1121/10.0009411

A model of speech recognition for hearing-impaired listeners based on deep learning

J Acoust Soc Am. 2022 Mar;151(3):1417. doi: 10.1121/10.0009411.

Authors

Jana Roßbach¹, Birger Kollmeier², Bernd T Meyer¹

Affiliations

¹ Communication Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University, D-26111 Oldenburg, Germany.
² Medical Physics and Cluster of Excellence Hearing4all, Carl von Ossietzky University, D-26111 Oldenburg, Germany.

PMID: 35364918
DOI: 10.1121/10.0009411

Abstract

Automatic speech recognition (ASR) has made major progress based on deep machine learning, which motivated the use of deep neural networks (DNNs) as perception models and specifically to predict human speech recognition (HSR). This study investigates if a modeling approach based on a DNN that serves as phoneme classifier [Spille, Ewert, Kollmeier, and Meyer (2018). Comput. Speech Lang. 48, 51-66] can predict HSR for subjects with different degrees of hearing loss when listening to speech embedded in different complex noises. The eight noise signals range from simple stationary noise to a single competing talker and are added to matrix sentences, which are presented to 20 hearing-impaired (HI) listeners (categorized into three groups with different types of age-related hearing loss) to measure their speech recognition threshold (SRT), i.e., the signal-to-noise ratio with 50% word recognition rate. These are compared to responses obtained from the ASR-based model using degraded feature representations that take into account the individual hearing loss of the participants captured by a pure-tone audiogram. Additionally, SRTs obtained from eight normal-hearing (NH) listeners are analyzed. For NH subjects and three groups of HI listeners, the average SRT prediction error is below 2 dB, which is lower than the errors of the baseline models.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Deep Learning*
Hearing / physiology
Humans
Presbycusis*
Speech
Speech Perception* / physiology