Dysarthric Speech Enhancement Based on Convolution Neural Network

Syu-Siang Wang; Yu Tsao; Wei-Zhong Zheng; Hsiu-Wei Yeh; Pei-Chun Li; Shih-Hau Fang; Ying-Hui Lai

doi:10.1109/EMBC48229.2022.9871531

Dysarthric Speech Enhancement Based on Convolution Neural Network

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul:2022:60-64. doi: 10.1109/EMBC48229.2022.9871531.

Authors

Syu-Siang Wang, Yu Tsao, Wei-Zhong Zheng, Hsiu-Wei Yeh, Pei-Chun Li, Shih-Hau Fang, Ying-Hui Lai

PMID: 36085875
DOI: 10.1109/EMBC48229.2022.9871531

Abstract

Generally, those patients with dysarthria utter a distorted sound and the restrained intelligibility of a speech for both human and machine. To enhance the intelligibility of dysarthric speech, we applied a deep learning-based speech enhancement (SE) system in this task. Conventional SE approaches are used for shrinking noise components from the noise-corrupted input, and thus improve the sound quality and intelligibility simultaneously. In this study, we are focusing on reconstructing the severely distorted signal from the dysarthric speech for improving intelligibility. The proposed SE system prepares a convolutional neural network (CNN) model in the training phase, which is then used to process the dysarthric speech in the testing phase. During training, paired dysarthric-normal speech utterances are required. We adopt a dynamic time warping technique to align the dysarthric-normal utter-ances. The gained training data are used to train a CNN - based SE model. The proposed SE system is evaluated on the Google automatic speech recognition (ASR) system and a subjective listening test. The results showed that the proposed method could notably enhance the recognition performance for more than 10% in each of ASR and human recognitions from the unprocessed dysarthric speech. Clinical Relevance- This study enhances the intelligibility and ASR accuracy from a dysarthria speech to more than 10.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Auditory Perception
Dysarthria* / diagnosis
Humans
Neural Networks, Computer
Sound
Speech*