Comparison of multiple linear regression and machine learning methods in predicting cognitive function in older Chinese type 2 diabetes patients

BMC Neurol. 2024 Jan 2;24(1):11. doi: 10.1186/s12883-023-03507-w.

Abstract

Introduction: The prevalence of type 2 diabetes (T2D) has increased dramatically in recent decades, and there are increasing indications that dementia is related to T2D. Previous attempts to analyze such relationships principally relied on traditional multiple linear regression (MLR). However, recently developed machine learning methods (Mach-L) outperform MLR in capturing non-linear relationships. The present study applied four different Mach-L methods to analyze the relationships between risk factors and cognitive function in older T2D patients, seeking to compare the accuracy between MLR and Mach-L in predicting cognitive function and to rank the importance of risks factors for impaired cognitive function in T2D.

Methods: We recruited older T2D between 60-95 years old without other major comorbidities. Demographic factors and biochemistry data were used as independent variables and cognitive function assessment (CFA) was conducted using the Montreal Cognitive Assessment as an independent variable. In addition to traditional MLR, we applied random forest (RF), stochastic gradient boosting (SGB), Naïve Byer's classifier (NB) and eXtreme gradient boosting (XGBoost).

Results: Totally, the test cohort consisted of 197 T2D (98 men and 99 women). Results showed that all ML methods outperformed MLR, with symmetric mean absolute percentage errors for MLR, RF, SGB, NB and XGBoost respectively of 0.61, 0.599, 0.606, 0.599 and 0.2139. Education level, age, frailty score, fasting plasma glucose and body mass index were identified as key factors in descending order of importance.

Conclusion: In conclusion, our study demonstrated that RF, SGB, NB and XGBoost are more accurate than MLR for predicting CFA score, and identify education level, age, frailty score, fasting plasma glucose, body fat and body mass index as important risk factors in an older Chinese T2D cohort.

Keywords: Cognitive function; Machine learning; Type 2 diabetes.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Blood Glucose
  • China / epidemiology
  • Cognition
  • Diabetes Mellitus, Type 2* / complications
  • Diabetes Mellitus, Type 2* / epidemiology
  • Female
  • Frailty*
  • Humans
  • Linear Models
  • Machine Learning
  • Male
  • Middle Aged

Substances

  • Blood Glucose