Robust acoustic object detection

J Acoust Soc Am. 2005 Oct;118(4):2634-48. doi: 10.1121/1.2011411.

Abstract

We consider a novel approach to the problem of detecting phonological objects like phonemes, syllables, or words, directly from the speech signal. We begin by defining local features in the time-frequency plane with built in robustness to intensity variations and time warping. Global templates of phonological objects correspond to the coincidence in time and frequency of patterns of the local features. These global templates are constructed by using the statistics of the local features in a principled way. The templates have clear phonetic interpretability, are easily adaptable, have built in invariances, and display considerable robustness in the face of additive noise and clutter from competing speakers. We provide a detailed evaluation of the performance of some diphone detectors and a word detector based on this approach. We also perform some phonetic classification experiments based on the edge-based features suggested here.

MeSH terms

  • Acoustic Stimulation
  • Algorithms*
  • Databases, Factual
  • Humans
  • Models, Biological
  • Noise
  • Phonetics*
  • ROC Curve
  • Sound Spectrography
  • Speech Acoustics*
  • Speech Perception / physiology*
  • Speech Production Measurement
  • Time Factors