Investigation on explainable machine learning models to predict chronic kidney diseases

Samit Kumar Ghosh; Ahsan H Khandoker

doi:10.1038/s41598-024-54375-4

Investigation on explainable machine learning models to predict chronic kidney diseases

Sci Rep. 2024 Feb 14;14(1):3687. doi: 10.1038/s41598-024-54375-4.

Authors

Samit Kumar Ghosh¹, Ahsan H Khandoker²

Affiliations

¹ Department of Biomedical Engineering & Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates. samitnitrkl@gmail.com.
² Department of Biomedical Engineering & Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates.

Abstract

Chronic kidney disease (CKD) is a major worldwide health problem, affecting a large proportion of the world's population and leading to higher morbidity and death rates. The early stages of CKD sometimes present without visible symptoms, causing patients to be unaware. Early detection and treatments are critical in reducing complications and improving the overall quality of life for people afflicted. In this work, we investigate the use of an explainable artificial intelligence (XAI)-based strategy, leveraging clinical characteristics, to predict CKD. This study collected clinical data from 491 patients, comprising 56 with CKD and 435 without CKD, encompassing clinical, laboratory, and demographic variables. To develop the predictive model, five machine learning (ML) methods, namely logistic regression (LR), random forest (RF), decision tree (DT), Naïve Bayes (NB), and extreme gradient boosting (XGBoost), were employed. The optimal model was selected based on accuracy and area under the curve (AUC). Additionally, the SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) algorithms were utilized to demonstrate the influence of the features on the optimal model. Among the five models developed, the XGBoost model achieved the best performance with an AUC of 0.9689 and an accuracy of 93.29%. The analysis of feature importance revealed that creatinine, glycosylated hemoglobin type A1C (HgbA1C), and age were the three most influential features in the XGBoost model. The SHAP force analysis further illustrated the model's visualization of individualized CKD predictions. For further insights into individual predictions, we also utilized the LIME algorithm. This study presents an interpretable ML-based approach for the early prediction of CKD. The SHAP and LIME methods enhance the interpretability of ML models and help clinicians better understand the rationale behind the predicted outcomes more effectively.

MeSH terms

Artificial Intelligence*
Bayes Theorem
Calcium Compounds*
Glycated Hemoglobin
Humans
Machine Learning
Oxides*
Quality of Life
Renal Insufficiency, Chronic* / diagnosis

Substances

lime
Glycated Hemoglobin
Oxides
Calcium Compounds

Grants and funding

Award Number: 8474000408/Khalifa University, United Arab Emirates