Introduction: The study objective was to build a machine learning model to predict incident mild cognitive impairment, Alzheimer's Disease, and related dementias from structured data using administrative and electronic health record sources.
Methods: A cohort of patients (n = 121,907) and controls (n = 5,307,045) was created for modeling using data within 2 years of patient's incident diagnosis date. Additional cohorts 3-8 years removed from index data are used for prediction. Training cohorts were matched on age, gender, index year, and utilization, and fit with a gradient boosting machine, lightGBM.
Results: Incident 2-year model quality on a held-out test set had a sensitivity of 47% and area-under-the-curve of 87%. In the 3-year model, the learned labels achieved 24% (71%), which dropped to 15% (72%) in year 8.
Discussion: The ability of the model to discriminate incident cases of dementia implies that it can be a worthwhile tool to screen patients for trial recruitment and patient management.
Keywords: Alzheimer's disease; Gradient boosting machine; Machine learning; Onset of dementia; Prediction.
© 2019 The Authors.