Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array

J Acoust Soc Am. 2021 Jun;149(6):3841. doi: 10.1121/10.0005127.

Abstract

A high resolution direction-of-arrival (DOA) approach is presented based on deep neural networks (DNNs) for multiple speech sources localization using a small scale array. First, three invariant features from the time-frequency spectrum of the input signal include generalized cross correlation (GCC) coefficients, GCC coefficients in the mel-scaled subband, and the combination of GCC coefficients and logarithmic mel spectrogram. Then the DNN labels are designed to fit the Gaussian distribution, which is similar to the spatial spectrum of the multiple signal classification. Finally, DOAs are predicted by performing peak detection on the DNN outputs, where the maximum values correspond to speech signals of interest. The DNN-based DOA estimation method outperforms the existing high resolution beamforming techniques in numerical simulations. The proposed framework implemented with a four-element microphone array can effectively localize multiple speech sources in an indoor environment.