GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder

Ganjun Liu; Tao Zhang; Biyun Ding; Ying Lv; Xiaohui Hou; Haoyang Guo; Yaqin Wu; Dehui Fu

doi:10.1016/j.jvoice.2023.03.012

GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder

J Voice. 2023 May 9:S0892-1997(23)00105-4. doi: 10.1016/j.jvoice.2023.03.012. Online ahead of print.

Authors

Ganjun Liu¹, Tao Zhang¹, Biyun Ding¹, Ying Lv¹, Xiaohui Hou¹, Haoyang Guo¹, Yaqin Wu², Dehui Fu³

Affiliations

¹ School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
² School of Software, Shanxi Agricultural University, Shanxi, China.
³ School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Department of Otolaryngology, Tianjin Medical University, Tianjin, China.

PMID: 37169702
DOI: 10.1016/j.jvoice.2023.03.012

Abstract

Objective: Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises.

Methods: In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech.

Results: Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.

Keywords: Bottleneck feature; Golden section; Pathological speech enhancement; Variational autoencoder.