Categorizing normal and pathological voices: automated and perceptual categorization

J Voice. 2011 Nov;25(6):700-8. doi: 10.1016/j.jvoice.2010.04.009. Epub 2010 Jun 25.

Abstract

Objectives: The aims of the present study were to evaluate the accuracy of an elaborated automated voice categorization system that classified voice signal samples into healthy and pathological classes and to compare it with classification accuracy that was attained by human experts.

Material and methods: We investigated the effectiveness of 10 different feature sets in the classification of voice recordings of the sustained phonation of the vowel sound /a/ into the healthy and two pathological voice classes, and proposed a new approach to building a sequential committee of support vector machines (SVMs) for the classification. By applying "genetic search" (a search technique used to find solutions to optimization problems), we determined the optimal values of hyper-parameters of the committee and the feature sets that provided the best performance. Four experienced clinical voice specialists who evaluated the same voice recordings served as experts. The "gold standard" for classification was clinically and histologically proven diagnosis.

Results: A considerable improvement in the classification accuracy was obtained from the committee when compared with the single feature type-based classifiers. In the experimental investigations that were performed using 444 voice recordings coming from 148 subjects, three recordings from each subject, we obtained the correct classification rate (CCR) of over 92% when classifying into the healthy-pathological voice classes, and over 90% when classifying into three classes (healthy voice and two nodular or diffuse lesion voice classes). The CCR obtained from human experts was about 74% and 60%, respectively.

Conclusion: When operating under the same experimental conditions, the automated voice discrimination technique based on sequential committee of SVM was considerably more effective than the human experts.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Auditory Perception
  • Automation
  • Dysphonia / classification*
  • Dysphonia / diagnosis
  • Female
  • Humans
  • Male
  • Middle Aged
  • Support Vector Machine
  • Voice*
  • Young Adult