Using SincNet for Learning Pathological Voice Disorders

Chao-Hsiang Hung; Syu-Siang Wang; Chi-Te Wang; Shih-Hau Fang

doi:10.3390/s22176634

Using SincNet for Learning Pathological Voice Disorders

Sensors (Basel). 2022 Sep 2;22(17):6634. doi: 10.3390/s22176634.

Authors

Chao-Hsiang Hung¹, Syu-Siang Wang¹, Chi-Te Wang², Shih-Hau Fang¹

Affiliations

¹ Department of Electrical Engineering, Yuan Ze University, Taoyuan 320, Taiwan.
² Department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, New Taipei City 220, Taiwan.

Abstract

Deep learning techniques such as convolutional neural networks (CNN) have been successfully applied to identify pathological voices. However, the major disadvantage of using these advanced models is the lack of interpretability in explaining the predicted outcomes. This drawback further introduces a bottleneck for promoting the classification or detection of voice-disorder systems, especially in this pandemic period. In this paper, we proposed using a series of learnable sinc functions to replace the very first layer of a commonly used CNN to develop an explainable SincNet system for classifying or detecting pathological voices. The applied sinc filters, a front-end signal processor in SincNet, are critical for constructing the meaningful layer and are directly used to extract the acoustic features for following networks to generate high-level voice information. We conducted our tests on three different Far Eastern Memorial Hospital voice datasets. From our evaluations, the proposed approach achieves the highest 7%-accuracy and 9%-sensitivity improvements from conventional methods and thus demonstrates superior performance in predicting input pathological waveforms of the SincNet system. More importantly, we intended to give possible explanations between the system output and the first-layer extracted speech features based on our evaluated results.

Keywords: SincNet; classification; convolutional neural network; pathological voice; sinc functions.

MeSH terms

Acoustics
Humans
Neural Networks, Computer
Voice Disorders* / diagnosis
Voice*

Grants and funding

The authors would like to thank the National Science and Technology Council for providing financial supports (MOST 106-2314-B-418-003, 111-2622-E-155-002, and 111-2221-E-155-020).