Improving the Efficiency of Dysarthria Voice Conversion System Based on Data Augmentation

IEEE Trans Neural Syst Rehabil Eng. 2023:31:4613-4623. doi: 10.1109/TNSRE.2023.3331524. Epub 2023 Nov 30.

Abstract

Dysarthria, a speech disorder often caused by neurological damage, compromises the control of vocal muscles in patients, making their speech unclear and communication troublesome. Recently, voice-driven methods have been proposed to improve the speech intelligibility of patients with dysarthria. However, most methods require a significant representation of both the patient's and target speaker's corpus, which is problematic. This study aims to propose a data augmentation-based voice conversion (VC) system to reduce the recording burden on the speaker. We propose dysarthria voice conversion 3.1 (DVC 3.1) based on a data augmentation approach, including text-to-speech and StarGAN-VC architecture, to synthesize a large target and patient-like corpus to lower the burden of recording. An objective evaluation metric of the Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC 3.1 under free-talk conditions. The DVC system without data augmentation (DVC 3.0) was used for comparison. Subjective and objective evaluation based on the experimental results indicated that the proposed DVC 3.1 system enhanced the Google ASR of two dysarthria patients by approximately [62.4%, 43.3%] and [55.9%, 57.3%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. Further, the proposed DVC 3.1 increased the speech intelligibility of two dysarthria patients by approximately [54.2%, 22.3%] and [63.4%, 70.1%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. The proposed DVC 3.1 system offers significant potential to improve the speech intelligibility performance of patients with dysarthria and enhance verbal communication quality.

MeSH terms

  • Dysarthria* / etiology
  • Humans
  • Laryngeal Muscles
  • Speech Intelligibility / physiology
  • Voice*