Speech endpoint detection based on speech time-frequency enhancement and spectral entropy

Fan Yingle; Li Yi; Wu Chuanyan

doi:10.1109/IEMBS.2005.1615515

Speech endpoint detection based on speech time-frequency enhancement and spectral entropy

Conf Proc IEEE Eng Med Biol Soc. 2005:2005:4682-4. doi: 10.1109/IEMBS.2005.1615515.

Authors

Fan Yingle¹, Li Yi, Wu Chuanyan

Affiliation

¹ Department of Instrument Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China (e-mail: Fan@hziee.edu.cn).

PMID: 17281285
DOI: 10.1109/IEMBS.2005.1615515

Abstract

In the process of speech recognition, it is especially crucial to precisely locate endpoints of the input utterance to be free of non-speech regions. This paper proposes a novel approach that finds robust features for endpoint detection in a noisy environment. In this proposed method, we integrate both time-frequency enhancement and the spectral entropy feature. Firstly, the noisy speech is enhanced using spectral subtraction method, in frequency domain to remove the additive noises. Then in time domain, a weight function built by short-time energy and zero-crossing rate is used to remove the noise produced by the spectral subtraction. Finally spectra entropy-based method is used to detect the endpoints. By monitoring the transition of the extracted feature, more precise endpoints could be found. The proposed algorithm is shown to be well suited for the detection of speech endpoint and is very robust for different types of noise, especially for low SNR. Furthermore, the algorithm has a low complexity and is suitable for real-time DSP system.