A two-stage variable selection and classification approach for Parkinson's disease detection by using voice recording replications

Comput Methods Programs Biomed. 2017 Apr:142:147-156. doi: 10.1016/j.cmpb.2017.02.019. Epub 2017 Feb 22.

Abstract

Background and objective: In the scientific literature, there is a lack of variable selection and classification methods considering replicated data. The problem motivating this work consists in the discrimination of people suffering Parkinson's disease from healthy subjects based on acoustic features automatically extracted from replicated voice recordings.

Methods: A two-stage variable selection and classification approach has been developed to properly match the replication-based experimental design. The way the statistical approach has been specified allows that the computational problems are solved by using an easy-to-implement Gibbs sampling algorithm.

Results: The proposed approach produces an acceptable predictive capacity for PD discrimination with the considered database, despite the fact that the sample size is relatively small. Specifically, the accuracy rate, sensitivity and specificity are 86.2%, 82.5%, and 90.0%, respectively. However, the most important fact is that there is an improvement in the interpretability of the results at the same time that it is shown a better chain mixing and a lower computation time with respect to the only-classification approaches presented in the scientific literature.

Conclusions: To the best of the authors' knowledge, this is the first approach developed to properly consider intra-subject variability for variable selection and classification. Although the proposed approach has been applied for PD discrimination, it can be applied in other contexts with similar replication-based experimental designs.

Keywords: Bayesian binary regression; Gibbs sampling; Parkinson’s disease; Replicated measurements; Variable selection; Voice features.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Bayes Theorem
  • Databases, Factual
  • Diagnosis, Computer-Assisted*
  • Humans
  • Models, Statistical
  • Parkinson Disease / diagnosis*
  • Regression Analysis
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Software
  • Speech Acoustics*
  • Voice*