Develop a diagnostic tool for dementia using machine learning and non-imaging features

Huan Wang; Li Sheng; Shanhu Xu; Yu Jin; Xiaoqing Jin; Song Qiao; Qingqing Chen; Wenmin Xing; Zhenlei Zhao; Jing Yan; Genxiang Mao; Xiaogang Xu

doi:10.3389/fnagi.2022.945274

Develop a diagnostic tool for dementia using machine learning and non-imaging features

Front Aging Neurosci. 2022 Aug 29:14:945274. doi: 10.3389/fnagi.2022.945274. eCollection 2022.

Authors

Huan Wang¹, Li Sheng², Shanhu Xu³, Yu Jin³, Xiaoqing Jin³, Song Qiao³, Qingqing Chen⁴, Wenmin Xing⁵, Zhenlei Zhao⁵, Jing Yan⁵, Genxiang Mao⁵, Xiaogang Xu⁵

Affiliations

¹ Department of Biostatistics, The George Washington University, Washington, DC, United States.
² Department of Mathematics, Drexel University, Philadelphia, PA, United States.
³ Department of Neurology, Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China.
⁴ Department of Radiology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.
⁵ Zhejiang Provincial Key Lab of Geriatrics & Geriatrics Institute of Zhejiang Province, Department of Geriatrics, Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China.

Abstract

Background: Early identification of Alzheimer's disease or mild cognitive impairment can help guide direct prevention and supportive treatments, improve outcomes, and reduce medical costs. Existing advanced diagnostic tools are mostly based on neuroimaging and suffer from certain problems in cost, reliability, repeatability, accessibility, ease of use, and clinical integration. To address these problems, we developed, evaluated, and implemented an early diagnostic tool using machine learning and non-imaging factors.

Methods and results: A total of 654 participants aged 65 or older from the Nursing Home in Hangzhou, China were identified. Information collected from these patients includes dementia status and 70 demographic, cognitive, socioeconomic, and clinical features. Logistic regression, support vector machine (SVM), neural network, random forest, extreme gradient boosting (XGBoost), least absolute shrinkage and selection operator (LASSO), and best subset models were trained, tuned, and internally validated using a novel double cross validation algorithm and multiple evaluation metrics. The trained models were also compared and externally validated using a separate dataset with 1,100 participants from four communities in Zhejiang Province, China. The model with the best performance was then identified and implemented online with a friendly user interface. For the nursing dataset, the top three models are the neural network (AUROC = 0.9435), XGBoost (AUROC = 0.9398), and SVM with the polynomial kernel (AUROC = 0.9213). With the community dataset, the best three models are the random forest (AUROC = 0.9259), SVM with linear kernel (AUROC = 0.9282), and SVM with polynomial kernel (AUROC = 0.9213). The F1 scores and area under the precision-recall curve showed that the SVMs, neural network, and random forest were robust on the unbalanced community dataset. Overall the SVM with the polynomial kernel was found to be the best model. The LASSO and best subset models identified 17 features most relevant to dementia prediction, mostly from cognitive test results and socioeconomic characteristics.

Conclusion: Our non-imaging-based diagnostic tool can effectively predict dementia outcomes. The tool can be conveniently incorporated into clinical practice. Its online implementation allows zero barriers to its use, which enhances the disease's diagnosis, improves the quality of care, and reduces costs.

Keywords: Alzheimer’s disease; dementia; early diagnostic tool; machine learning; non-imaging factors.