Spectro-temporal modulation energy based mask for robust speaker identification

Tai-Shih Chi; Ting-Han Lin; Chung-Chien Hsu

doi:10.1121/1.3697534

Spectro-temporal modulation energy based mask for robust speaker identification

J Acoust Soc Am. 2012 May;131(5):EL368-74. doi: 10.1121/1.3697534.

Authors

Tai-Shih Chi¹, Ting-Han Lin, Chung-Chien Hsu

Affiliation

¹ Department of Electrical Engineering, National Chiao Tung University, Hsinchu 300, Taiwan. tschi@mail.nctu.edu.tw

PMID: 22559454
DOI: 10.1121/1.3697534

Abstract

Spectro-temporal modulations of speech encode speech structures and speaker characteristics. An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations using the TIMIT and GRID corpora. Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients. In addition, the proposed method also outperforms the system, which uses auditory-based nonnegative tensor cepstral coefficients [Q. Wu and L. Zhang, "Auditory sparse representation for robust speaker recognition based on tensor structure," EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008)], in low SNR (≤ 10 dB) conditions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Cochlea / immunology
Cochlea / physiology
Computer Simulation
Female
Hearing / physiology
Humans
Male
Models, Biological
Neurons / physiology
Noise
Perceptual Masking / physiology
Sound Spectrography
Speech Intelligibility / physiology
Speech Perception / physiology*