Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

Min Zhang; Xiang Pan; Yining Shen; Jianjun Qiu

doi:10.1121/10.0005127

Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

J Acoust Soc Am. 2021 Jun;149(6):3841. doi: 10.1121/10.0005127.

Authors

Min Zhang¹, Xiang Pan¹, Yining Shen¹, Jianjun Qiu²

Affiliations

¹ College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China.
² Hangzhou Applied Acoustics Research Institute, Hangzhou, Zhejiang 310012, China.

PMID: 34241491
DOI: 10.1121/10.0005127

Abstract

A high resolution direction-of-arrival (DOA) approach is presented based on deep neural networks (DNNs) for multiple speech sources localization using a small scale array. First, three invariant features from the time-frequency spectrum of the input signal include generalized cross correlation (GCC) coefficients, GCC coefficients in the mel-scaled subband, and the combination of GCC coefficients and logarithmic mel spectrogram. Then the DNN labels are designed to fit the Gaussian distribution, which is similar to the spatial spectrum of the multiple signal classification. Finally, DOAs are predicted by performing peak detection on the DNN outputs, where the maximum values correspond to speech signals of interest. The DNN-based DOA estimation method outperforms the existing high resolution beamforming techniques in numerical simulations. The proposed framework implemented with a four-element microphone array can effectively localize multiple speech sources in an indoor environment.