Development of interpretable machine learning models for prediction of acute kidney injury after noncardiac surgery: a retrospective cohort study

Int J Surg. 2024 Mar 4. doi: 10.1097/JS9.0000000000001237. Online ahead of print.

Abstract

Background: Early identification of patients at high risk of postoperative acute kidney injury (AKI) can facilitate the development of preventive approaches. This study aimed to develop prediction models for postoperative AKI in noncardiac surgery using machine learning algorithms. We also evaluated the predictive performance of models that included only preoperative variables or only important predictors.

Materials and methods: Adult patients undergoing noncardiac surgery were retrospectively included in the study (76,457 patients in the discovery cohort and 11,910 patients in the validation cohort). AKI was determined using the KDIGO criteria. The prediction model was developed using 87 variables (56 preoperative variables and 31 intraoperative variables). A variety of machine learning algorithms were employed to develop the model, including logistic regression, random forest, extreme gradient boosting, and gradient boosting decision trees (GBDT). The performance of different models was compared using the area under the receiver operating characteristic curve (AUROC). Shapley Additive Explanations (SHAP) analysis was employed for model interpretation.

Results: The patients in the discovery cohort had a median age of 52 years (IQR: 42-61 y), and 1179 patients (1.5%) developed AKI after surgery. The GBDT algorithm showed the best predictive performance using all available variables, or only preoperative variables. The AUROCs were 0.849 (95% CI, 0.835-0.863) and 0.828 (95% CI, 0.813-0.843), respectively. The SHAP analysis showed that age, surgical duration, preoperative serum creatinine and gamma-glutamyltransferase, as well as American Society of Anesthesiologists physical status III were the most important five features. When gradually reducing the features, the AUROCs decreased from 0.852 (including the top 40 features) to 0.839 (including the top 10 features). In the validation cohort, we observed a similar pattern regarding the models' predictive performance.

Conclusions: The machine learning models we developed had satisfactory predictive performance for identifying high-risk postoperative AKI patients. Further, we found that model performance was only slightly affected when only preoperative variables or only the most important predictive features were included.