Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

Shih-Hau Fang; Yu Tsao; Min-Jing Hsiao; Ji-Ying Chen; Ying-Hui Lai; Feng-Chuan Lin; Chi-Te Wang

doi:10.1016/j.jvoice.2018.02.003

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

J Voice. 2019 Sep;33(5):634-641. doi: 10.1016/j.jvoice.2018.02.003. Epub 2018 Mar 19.

Authors

Shih-Hau Fang¹, Yu Tsao², Min-Jing Hsiao¹, Ji-Ying Chen¹, Ying-Hui Lai³, Feng-Chuan Lin⁴, Chi-Te Wang⁵

Affiliations

¹ Department of Electric Engineering, Yuan Ze University, Taoyuan, Taiwan.
² Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.
³ Institute of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan.
⁴ Department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, Taipei, Taiwan.
⁵ Department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, Taipei, Taiwan; Department of Special Education, University of Taipei, Taipei, Taiwan; Department of Otolaryngology Head and Neck Surgery, National Taiwan University College of Medicine, Taipei, Taiwan. Electronic address: drwangct@gmail.com.

PMID: 29567049
DOI: 10.1016/j.jvoice.2018.02.003

Abstract

Objectives: Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms.

Methods: This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms.

Results: The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms.

Conclusions: By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives.

Keywords: Neoplasm; Nodule; Polyp; Spasmodic dysphonia; Sulcus.

Publication types

Comparative Study

MeSH terms

Acoustics*
Adult
Aged
Aged, 80 and over
Deep Learning*
Diagnosis, Computer-Assisted
Dysphonia / diagnosis*
Dysphonia / physiopathology
Female
Humans
Male
Middle Aged
Pilot Projects
Predictive Value of Tests
Reproducibility of Results
Retrospective Studies
Signal Processing, Computer-Assisted*
Sound Spectrography
Speech Acoustics*
Speech Production Measurement*
Support Vector Machine*
Vocal Cords / pathology
Vocal Cords / physiopathology*
Voice Quality*
Young Adult