GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder

J Voice. 2023 May 9:S0892-1997(23)00105-4. doi: 10.1016/j.jvoice.2023.03.012. Online ahead of print.

Abstract

Objective: Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises.

Methods: In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech.

Results: Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.

Keywords: Bottleneck feature; Golden section; Pathological speech enhancement; Variational autoencoder.