Robust fundamental frequency estimation in sustained vowels: detailed algorithmic comparisons and information fusion with adaptive Kalman filtering

J Acoust Soc Am. 2014 May;135(5):2885-901. doi: 10.1121/1.4870484.

Abstract

There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency (F(0)) of speech signals. This study examines ten F(0) estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate F(0) in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual F(0) estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual F(0) estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single F(0) estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate F(0) estimation is required.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Communication Aids for Disabled
  • Dysphonia / diagnosis
  • Dysphonia / physiopathology
  • Humans
  • Phonation*
  • Phonetics*
  • Pitch Perception
  • Sound Spectrography
  • Speech Acoustics
  • Speech Production Measurement / methods
  • Voice Quality