A temporal-spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission

J Acoust Soc Am. 2021 Oct;150(4):2577. doi: 10.1121/10.0006528.

Abstract

Packet loss concealment (PLC) aims to mitigate speech impairments caused by packet losses so as to improve speech perceptual quality. This paper proposes an end-to-end PLC algorithm with a time-frequency hybrid generative adversarial network, which incorporates a dilated residual convolution and the integration of a time-domain discriminator and frequency-domain discriminator into a convolutional encoder-decoder architecture. The dilated residual convolution is employed to aggregate the short-term and long-term context information of lost speech frames through two network receptive fields with different dilation rates, and the integrated time-frequency discriminators are proposed to learn multi-resolution time-frequency features from correctly received speech frames with both time-domain waveform and frequency-domain complex spectrums. Both causal and noncausal strategies are proposed for the packet-loss problem, which can effectively reduce the transitional distortion caused by lost speech frames with a significantly reduced number of training parameters and computational complexity. The experimental results show that the proposed method can achieve better performance in terms of three objective measurements, including the signal-to-noise ratio, perceptual evaluation of speech quality, and short-time objective intelligibility. The results of the subjective listening test further confirm a better performance in the speech perceptual quality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Auditory Perception
  • Signal-To-Noise Ratio
  • Speech*