Categorizing normal and pathological voices: automated and perceptual categorization

Virgilijus Uloza; Antanas Verikas; Marija Bacauskiene; Adas Gelzinis; Ruta Pribuisiene; Marius Kaseta; Viktoras Saferis

doi:10.1016/j.jvoice.2010.04.009

Categorizing normal and pathological voices: automated and perceptual categorization

J Voice. 2011 Nov;25(6):700-8. doi: 10.1016/j.jvoice.2010.04.009. Epub 2010 Jun 25.

Authors

Virgilijus Uloza¹, Antanas Verikas, Marija Bacauskiene, Adas Gelzinis, Ruta Pribuisiene, Marius Kaseta, Viktoras Saferis

Affiliation

¹ Department of Otolaryngology, Kaunas University of Medicine, Kaunas, Lithuania. virgilijus.uloza@kmuk.lt

PMID: 20579842
DOI: 10.1016/j.jvoice.2010.04.009

Abstract

Objectives: The aims of the present study were to evaluate the accuracy of an elaborated automated voice categorization system that classified voice signal samples into healthy and pathological classes and to compare it with classification accuracy that was attained by human experts.

Material and methods: We investigated the effectiveness of 10 different feature sets in the classification of voice recordings of the sustained phonation of the vowel sound /a/ into the healthy and two pathological voice classes, and proposed a new approach to building a sequential committee of support vector machines (SVMs) for the classification. By applying "genetic search" (a search technique used to find solutions to optimization problems), we determined the optimal values of hyper-parameters of the committee and the feature sets that provided the best performance. Four experienced clinical voice specialists who evaluated the same voice recordings served as experts. The "gold standard" for classification was clinically and histologically proven diagnosis.

Results: A considerable improvement in the classification accuracy was obtained from the committee when compared with the single feature type-based classifiers. In the experimental investigations that were performed using 444 voice recordings coming from 148 subjects, three recordings from each subject, we obtained the correct classification rate (CCR) of over 92% when classifying into the healthy-pathological voice classes, and over 90% when classifying into three classes (healthy voice and two nodular or diffuse lesion voice classes). The CCR obtained from human experts was about 74% and 60%, respectively.

Conclusion: When operating under the same experimental conditions, the automated voice discrimination technique based on sequential committee of SVM was considerably more effective than the human experts.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Adult
Aged
Auditory Perception
Automation
Dysphonia / classification*
Dysphonia / diagnosis
Female
Humans
Male
Middle Aged
Support Vector Machine
Voice*
Young Adult