Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Leontien Pragt; Peter van Hengel; Dagmar Grob; Jan-Willem A Wasmann

doi:10.3389/fdgth.2022.806076

Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Front Digit Health. 2022 Feb 16:4:806076. doi: 10.3389/fdgth.2022.806076. eCollection 2022.

Authors

Leontien Pragt¹, Peter van Hengel^{1

2}, Dagmar Grob³, Jan-Willem A Wasmann¹

Affiliations

¹ Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Nijmegen, Netherlands.
² Pento Audiological Center Twente, Hengelo, Netherlands.
³ Department of Medical Imaging, Radboud University Medical Center, Nijmegen, Netherlands.

Abstract

Objective: Automated speech recognition (ASR) systems have become increasingly sophisticated, accurate, and deployable on many digital devices, including on a smartphone. This pilot study aims to examine the speech recognition performance of ASR apps using audiological speech tests. In addition, we compare ASR speech recognition performance to normal hearing and hearing impaired listeners and evaluate if standard clinical audiological tests are a meaningful and quick measure of the performance of ASR apps.

Methods: Four apps have been tested on a smartphone, respectively AVA, Earfy, Live Transcribe, and Speechy. The Dutch audiological speech tests performed were speech audiometry in quiet (Dutch CNC-test), Digits-in-Noise (DIN)-test with steady-state speech-shaped noise, sentences in quiet and in averaged long-term speech-shaped spectrum noise (Plomp-test). For comparison, the app's ability to transcribe a spoken dialogue (Dutch and English) was tested.

Results: All apps scored at least 50% phonemes correct on the Dutch CNC-test for a conversational speech intensity level (65 dB SPL) and achieved 90-100% phoneme recognition at higher intensity levels. On the DIN-test, AVA and Live Transcribe had the lowest (best) signal-to-noise ratio +8 dB. The lowest signal-to-noise measured with the Plomp-test was +8 to 9 dB for Earfy (Android) and Live Transcribe (Android). Overall, the word error rate for the dialogue in English (19-34%) was lower (better) than for the Dutch dialogue (25-66%).

Conclusion: The performance of the apps was limited on audiological tests that provide little linguistic context or use low signal to noise levels. For Dutch audiological speech tests in quiet, ASR apps performed similarly to a person with a moderate hearing loss. In noise, the ASR apps performed more poorly than most profoundly deaf people using a hearing aid or cochlear implant. Adding new performance metrics including the semantic difference as a function of SNR and reverberation time could help to monitor and further improve ASR performance.

Keywords: (automatic speech recognition), automated speech recognition, (ASR); automated speech audiometry; evaluation metric; hearing impairment; speech-to-text; voice-to-text technology.