Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model

Tim Jürgens; Thomas Brand

doi:10.1121/1.3224721

Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model

J Acoust Soc Am. 2009 Nov;126(5):2635-48. doi: 10.1121/1.3224721.

Authors

Tim Jürgens¹, Thomas Brand

Affiliation

¹ Medizinische Physik, Universitat Oldenburg, D-26111 Oldenburg, Germany.

PMID: 19894841
DOI: 10.1121/1.3224721

Abstract

This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.

Publication types

Comparative Study

MeSH terms

Adult
Auditory Cortex
Female
Hearing
Humans
Male
Models, Neurological*
Phonetics*
Predictive Value of Tests
Psychoacoustics*
Speech Intelligibility
Speech Perception*
Speech Reception Threshold Test
Speech Recognition Software*
Young Adult