To predict the risk of chronic kidney disease (CKD) using Generalized Additive2 Models (GA2M)

J Am Med Inform Assoc. 2023 Aug 18;30(9):1494-1502. doi: 10.1093/jamia/ocad097.

Abstract

Objective: To train and test a model predicting chronic kidney disease (CKD) using the Generalized Additive2 Model (GA2M), and compare it with other models being obtained with traditional or machine learning approaches.

Materials: We adopted the Health Search Database (HSD) which is a representative longitudinal database containing electronic healthcare records of approximately 2 million adults.

Methods: We selected all patients aged 15 years or older being active in HSD between January 1, 2018 and December 31, 2020 with no prior diagnosis of CKD. The following models were trained and tested using 20 candidate determinants for incident CKD: logistic regression, Random Forest, Gradient Boosting Machines (GBMs), GAM, and GA2M. Their prediction performances were compared by calculating Area Under Curve (AUC) and Average Precision (AP).

Results: Comparing the predictive performances of the 7 models, the AUC and AP for GBM and GA2M showed the highest values which were equal to 88.9%, 88.8% and 21.8%, 21.1%, respectively. These 2 models outperformed the others including logistic regression. In contrast to GBMs, GA2M kept the interpretability of variable combinations, including interactions and nonlinearities assessment.

Discussion: Although GA2M is slightly less performant than light GBM, it is not "black-box" algorithm, so being simply interpretable using shape and heatmap functions. This evidence supports the fact machine learning techniques should be adopted in case of complex algorithms such as those predicting the risk of CKD.

Conclusion: The GA2M was reliably performant in predicting CKD in primary care. A related decision support system might be therefore implemented.

Keywords: CKD; EBM; GA2M; prediction model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Algorithms*
  • Humans
  • Logistic Models
  • Machine Learning
  • Random Forest
  • Renal Insufficiency, Chronic* / diagnosis