Consonant confusions in white noise

J Acoust Soc Am. 2008 Aug;124(2):1220-33. doi: 10.1121/1.2913251.

Abstract

The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set [/m/, /n/], average-error set [/p/, /t/, /k/, /s/, /[please see text]/, /d/, /g/, /z/, /Z/], and high-error set /f/theta/b/, /v/, /E/,/theta/]. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Humans
  • Male
  • Models, Biological
  • Perceptual Masking*
  • Phonetics*
  • Speech Acoustics*
  • Speech Intelligibility*
  • Speech Perception*