Classification of Parkinson's Disease from Voice - Analysis of Data Selection Bias

Stud Health Technol Inform. 2023 May 18:302:127-128. doi: 10.3233/SHTI230079.

Abstract

A growing number of studies have been researching biomarkers of Parkinson's disease (PD) using mobile technology. Many have shown high accuracy in PD classification using machine learning (ML) and voice records from the mPower study, a large database of PD patients and healthy controls. Since the dataset has unbalanced class, gender and age distribution, it is important to consider appropriate sampling when assessing classification scores. We analyse biases, such as identity confounding and implicit learning of non-disease-specific characteristics and present a sampling strategy to highlight and prevent these problems.

Keywords: Machine Learning; Parkinson’s Disease; Selection Bias.

MeSH terms

  • Humans
  • Machine Learning
  • Parkinson Disease* / diagnosis
  • Selection Bias
  • Voice*