Automatic assessment of voice quality in the context of multiple annotations

Julián Gil González; Mauricio A Álvarez; Álvaro A Orozco

doi:10.1109/EMBC.2015.7319817

Automatic assessment of voice quality in the context of multiple annotations

Annu Int Conf IEEE Eng Med Biol Soc. 2015:2015:6236-9. doi: 10.1109/EMBC.2015.7319817.

Authors

Julián Gil González, Mauricio A Álvarez, Álvaro A Orozco

PMID: 26737717
DOI: 10.1109/EMBC.2015.7319817

Abstract

Approaches to evaluate voice quality include perceptual analysis, and acoustical analysis. Perceptual analysis is subjective and depends mostly on the ability of a specialist to assess a pathology, whereas acoustical analysis is objective, but highly relies on the quality of the so called annotations that the specialist assigns to the voice signal. The quality of the annotations for acoustical analysis depends heavily on the expertise and knowledge of the specialist. We face a scenario where we have annotations performed by several specialists with different levels of expertise and knowledge. Traditional pattern recognition methods employed in acoustical analysis are no longer applicable, since these methods are designed for scenarios where a "ground-truth" label is assigned by the specialist. In this paper, we apply recent developments in machine learning for taking into account multiple annotators for acoustical analysis of voice signals. For the classification step we compare two techniques, one of them based on Gaussian Processes for regression with multiple annotators, and the other is a multi-class Logistic Regression model that measures the annotator performance in terms of sensitivity and specificity. The performance of classifiers is assessed in terms of Cohen's Kappa index. Results show that the multi-annotator classification schemes have better performance when compared to techniques based on a traditional classifier where the true label is estimated from the multiple annotations available using majority voting.

MeSH terms

Algorithms
Humans
Logistic Models
Machine Learning
Normal Distribution
Voice Quality / physiology*