Recognition of speech in noise after application of time-frequency masks: dependence on frequency and threshold parameters

J Acoust Soc Am. 2013 Apr;133(4):2390-6. doi: 10.1121/1.4792143.

Abstract

Binary time-frequency (TF) masks can be applied to separate speech from noise. Previous studies have shown that with appropriate parameters, ideal TF masks can extract highly intelligible speech even at very low speech-to-noise ratios (SNRs). Two psychophysical experiments provided additional information about the dependence of intelligibility on the frequency resolution and threshold criteria that define the ideal TF mask. Listeners identified AzBio Sentences in noise, before and after application of TF masks. Masks generated with 8 or 16 frequency bands per octave supported nearly-perfect identification. Word recognition accuracy was slightly lower and more variable with 4 bands per octave. When TF masks were generated with a local threshold criterion of 0 dB SNR, the mean speech reception threshold was -9.5 dB SNR, compared to -5.7 dB for unprocessed sentences in noise. Speech reception thresholds decreased by about 1 dB per dB of additional decrease in the local threshold criterion. Information reported here about the dependence of speech intelligibility on frequency and level parameters has relevance for the development of non-ideal TF masks for clinical applications such as speech processing for hearing aids.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Acoustic Stimulation
  • Adult
  • Audiometry, Speech
  • Auditory Threshold*
  • Cues*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Noise / adverse effects*
  • Perceptual Masking*
  • Pitch Perception*
  • Psychoacoustics
  • Recognition, Psychology*
  • Signal-To-Noise Ratio
  • Sound Spectrography
  • Speech Intelligibility*
  • Speech Perception*
  • Time Factors
  • Young Adult