Role of mask pattern in intelligibility of ideal binary-masked noisy speech

J Acoust Soc Am. 2009 Sep;126(3):1415-26. doi: 10.1121/1.3179673.

Abstract

Intelligibility of ideal binary masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, masker types, and local criteria for forming the binary mask. The binary mask is computed from time-frequency decompositions of target and masker signals using two different schemes: an ideal binary mask computed by thresholding the local SNR within time-frequency units and a target binary mask computed by comparing the local target energy against the long-term average speech spectrum. By depicting intelligibility scores as a function of the difference between mixture SNR and local SNR threshold, alignment of the performance curves is obtained for a large range of mixture SNR levels. Large intelligibility benefits are obtained for both sparse and dense binary masks. When an ideal mask is dense with many ones, the effect of changing mixture SNR level while fixing the mask is significant, whereas for more sparse masks the effect is small or insignificant.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Acoustic Stimulation
  • Adult
  • Analysis of Variance
  • Automobiles
  • Humans
  • Middle Aged
  • Noise*
  • Noise, Occupational
  • Noise, Transportation
  • Perceptual Masking*
  • Psychoacoustics
  • Sound Spectrography
  • Speech Perception*
  • Speech*
  • Task Performance and Analysis