A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation

Tai-Shih Chi; Ching-Wen Huang; Wen-Sheng Chou

doi:10.1121/1.3697530

A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation

J Acoust Soc Am. 2012 May;131(5):EL361-7. doi: 10.1121/1.3697530.

Authors

Tai-Shih Chi¹, Ching-Wen Huang, Wen-Sheng Chou

Affiliation

¹ Department of Electrical Engineering, National Chiao Tung University, Hsinchu 300, Taiwan. tschi@mail.nctu.edu.tw

PMID: 22559453
DOI: 10.1121/1.3697530

Abstract

A frequency bin-wise nonlinear masking algorithm is proposed in the spectrogram domain for speech segregation in convolutive mixtures. The contributive weight from each speech source to a time-frequency unit of the mixture spectrogram is estimated by a nonlinear function based on location cues. For each sound source, a non-binary mask is formed from the estimated weights and is multiplied to the mixture spectrogram to extract the sound. Head-related transfer functions (HRTFs) are used to simulate convolutive sound mixtures perceived by listeners. Simulation results show our proposed method outperforms convolutive independent component analysis and degenerate unmixing and estimation technique methods in almost all test conditions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Female
Humans
Male
Music
Noise
Perceptual Masking / physiology*
Sound Localization / physiology
Speech Perception / physiology*