CompNet: Complementary network for single-channel speech enhancement

Neural Netw. 2023 Nov:168:508-517. doi: 10.1016/j.neunet.2023.09.041. Epub 2023 Sep 25.

Abstract

Recent multi-domain processing methods have demonstrated promising performance for monaural speech enhancement tasks. However, few of them explain why they behave better over single-domain approaches. As an attempt to fill this gap, this paper presents a complementary single-channel speech enhancement network (CompNet) that demonstrates promising denoising capabilities and provides a unique perspective to understand the improvements introduced by multi-domain processing. Specifically, the noisy speech is initially enhanced through a time-domain network. However, despite the waveform can be feasibly recovered, the distribution of the time-frequency bins may still be partly different from the target spectrum when we reconsider the problem in the frequency domain. To solve this problem, we design a dedicated dual-path network as a post-processing module to independently filter the magnitude and refine the phase. This further drives the estimated spectrum to closely approximate the target spectrum in the time-frequency domain. We conduct extensive experiments with the WSJ0-SI84 and VoiceBank + Demand datasets. Objective test results show that the performance of the proposed system is highly competitive with existing systems.

Keywords: Complementary; Filtering and refining; Speech enhancement; Time-domain; Time–frequency domain.

MeSH terms

  • Algorithms*
  • Noise
  • Signal-To-Noise Ratio
  • Speech*