Robust fundamental frequency estimation in sustained vowels: detailed algorithmic comparisons and information fusion with adaptive Kalman filtering

Athanasios Tsanas; Matías Zañartu; Max A Little; Cynthia Fox; Lorraine O Ramig; Gari D Clifford

doi:10.1121/1.4870484

Robust fundamental frequency estimation in sustained vowels: detailed algorithmic comparisons and information fusion with adaptive Kalman filtering

J Acoust Soc Am. 2014 May;135(5):2885-901. doi: 10.1121/1.4870484.

Authors

Athanasios Tsanas¹, Matías Zañartu², Max A Little³, Cynthia Fox⁴, Lorraine O Ramig⁵, Gari D Clifford¹

Affiliations

¹ Institute of Biomedical Engineering, Department of Engineering Science, Old Road Campus Research Building, University of Oxford, Headington, Oxford OX3 7DQ, United Kingdom.
² Department of Electronic Engineering at Universidad Técnica Federico Santa María, Av. España 1680, Casilla 110-V, Valparaiso 2390123, Chile.
³ MIT Media Lab, 77 Massachusetts Avenue, E14/E15, Cambridge, Massachusetts 02139-4307.
⁴ National Center for Voice and Speech, 136 South Main Street, Suite 320, Salt Lake City, Utah 84101-1623.
⁵ Speech, Language, and Hearing Sciences, 2501 Kittredge Loop Road, 409 UCB, University of Colorado, Boulder, Colorado 80309-0409.

Abstract

There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency (F(0)) of speech signals. This study examines ten F(0) estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate F(0) in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual F(0) estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual F(0) estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single F(0) estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate F(0) estimation is required.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Communication Aids for Disabled
Dysphonia / diagnosis
Dysphonia / physiopathology
Humans
Phonation*
Phonetics*
Pitch Perception
Sound Spectrography
Speech Acoustics
Speech Production Measurement / methods
Voice Quality

Abstract

Publication types

MeSH terms

Grants and funding