Non-motor Clinical and Biomarker Predictors Enable High Cross-Validated Accuracy Detection of Early PD but Lesser Cross-Validated Accuracy Detection of Scans Without Evidence of Dopaminergic Deficit

Front Neurol. 2020 May 11:11:364. doi: 10.3389/fneur.2020.00364. eCollection 2020.

Abstract

Background: Early stage (preclinical) detection of Parkinson's disease (PD) remains challenged yet is crucial to both differentiate it from other disorders and facilitate timely administration of neuroprotective treatment as it becomes available. Objective: In a cross-validation paradigm, this work focused on two binary predictive probability analyses: classification of early PD vs. controls and classification of early PD vs. SWEDD (scans without evidence of dopamine deficit). It was hypothesized that five distinct model types using combined non-motor and biomarker features would distinguish early PD from controls with > 80% cross-validated (CV) accuracy, but that the diverse nature of the SWEDD category would reduce early PD vs. SWEDD CV classification accuracy and alter model-based feature selection. Methods: Cross-sectional, baseline data was acquired from the Parkinson's Progressive Markers Initiative (PPMI). Logistic regression, general additive (GAM), decision tree, random forest and XGBoost models were fitted using non-motor clinical and biomarker features. Randomized train and test data partitions were created. Model classification CV performance was compared using the area under the curve (AUC), sensitivity, specificity and the Kappa statistic. Results: All five models achieved >0.80 AUC CV accuracy to distinguish early PD from controls. The GAM (CV AUC 0.928, sensitivity 0.898, specificity 0.897) and XGBoost (CV AUC 0.923, sensitivity 0.875, specificity 0.897) models were the top classifiers. Performance across all models was consistently lower in the early PD/SWEDD analyses, where the highest performing models were XGBoost (CV AUC 0.863, sensitivity 0.905, specificity 0.748) and random forest (CV AUC 0.822, sensitivity 0.809, specificity 0.721). XGBoost detection of non-PD SWEDD matched 1-2 years curated diagnoses in 81.25% (13/16) cases. In both early PD/control and early PD/SWEDD analyses, and across all models, hyposmia was the single most important feature to classification; rapid eye movement behavior disorder (questionnaire) was the next most commonly high ranked feature. Alpha-synuclein was a feature of import to early PD/control but not early PD/SWEDD classification and the Epworth Sleepiness scale was antithetically important to the latter but not former. Interpretation: Non-motor clinical and biomarker variables enable high CV discrimination of early PD vs. controls but are less effective discriminating early PD from SWEDD.

Keywords: SWEDD; XGBoost; logistic regression; predicting Parkinson's; random forest.