Speech endpoint detection based on speech time-frequency enhancement and spectral entropy

Conf Proc IEEE Eng Med Biol Soc. 2005:2005:4682-4. doi: 10.1109/IEMBS.2005.1615515.

Abstract

In the process of speech recognition, it is especially crucial to precisely locate endpoints of the input utterance to be free of non-speech regions. This paper proposes a novel approach that finds robust features for endpoint detection in a noisy environment. In this proposed method, we integrate both time-frequency enhancement and the spectral entropy feature. Firstly, the noisy speech is enhanced using spectral subtraction method, in frequency domain to remove the additive noises. Then in time domain, a weight function built by short-time energy and zero-crossing rate is used to remove the noise produced by the spectral subtraction. Finally spectra entropy-based method is used to detect the endpoints. By monitoring the transition of the extracted feature, more precise endpoints could be found. The proposed algorithm is shown to be well suited for the detection of speech endpoint and is very robust for different types of noise, especially for low SNR. Furthermore, the algorithm has a low complexity and is suitable for real-time DSP system.