Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition

Adv Exp Med Biol. 2013:787:333-41. doi: 10.1007/978-1-4614-1590-9_37.

Abstract

Complex auditory features such as spectro-temporal receptive fields (STRFs) derived from the cortical auditory neurons appear to be advantageous in sound processing. However, their physiological and functional relevance is still unclear. To assess the utility of such feature processing for speech reception in noise, automatic speech recognition (ASR) performance using feature sets obtained from physiological and/or psychoacoustical data and models is compared to human performance. Time-frequency representations with a nonlinear compression are compared with standard features such as mel-scaled spectrograms. Both alternatives serve as an input to model estimators that infer spectro-temporal filters (and subsequent nonlinearity) from physiological measurements in auditory brain areas of zebra finches. Alternatively, a filter bank of 2-dimensional Gabor functions is employed, which covers a wide range of modulation frequencies in the time and frequency domain. The results indicate a clear increase in ASR robustness using complex features (modeled by Gabor functions), while the benefit from physiologically derived STRFs is limited. In all cases, the use of power-normalized spectral representations increases performance, indicating that substantial dynamic compression is advantageous for level-independent pattern recognition. The methods employed may help physiologists to look for more relevant STRFs and to better understand specific differences in estimated STRFs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acoustic Stimulation / methods
  • Animals
  • Auditory Cortex / cytology
  • Auditory Cortex / physiology*
  • Finches
  • Humans
  • Models, Biological*
  • Neurons / physiology
  • Noise*
  • Nonlinear Dynamics
  • Psychoacoustics*
  • Sound Localization / physiology
  • Speech Discrimination Tests
  • Speech Perception / physiology*
  • Time Perception / physiology