Machine learning models for predicting acute kidney injury: a systematic review and critical appraisal

Iacopo Vagliano; Nicholas C Chesnaye; Jan Hendrik Leopold; Kitty J Jager; Ameen Abu-Hanna; Martijn C Schut

doi:10.1093/ckj/sfac181

Machine learning models for predicting acute kidney injury: a systematic review and critical appraisal

Clin Kidney J. 2022 Aug 2;15(12):2266-2280. doi: 10.1093/ckj/sfac181. eCollection 2022 Dec.

Authors

Iacopo Vagliano¹, Nicholas C Chesnaye², Jan Hendrik Leopold¹, Kitty J Jager², Ameen Abu-Hanna¹, Martijn C Schut¹

Affiliations

¹ Deptartment of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands.
² ERA Registry, Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands.

Abstract

Background: The number of studies applying machine learning (ML) to predict acute kidney injury (AKI) has grown steadily over the past decade. We assess and critically appraise the state of the art in ML models for AKI prediction, considering performance, methodological soundness, and applicability.

Methods: We searched PubMed and ArXiv, extracted data, and critically appraised studies based on the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD), Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), and Prediction Model Risk of Bias Assessment Tool (PROBAST) guidelines.

Results: Forty-six studies from 3166 titles were included. Thirty-eight studies developed a model, five developed and externally validated one, and three studies externally validated one. Flexible ML methods were used more often than deep learning, although the latter was common with temporal variables and text as predictors. Predictive performance showed an area under receiver operating curves ranging from 0.49 to 0.99. Our critical appraisal identified a high risk of bias in 39 studies. Some studies lacked internal validation, whereas external validation and interpretability of results were rarely considered. Fifteen studies focused on AKI prediction in the intensive care setting, and the US-derived Medical Information Mart for Intensive Care (MIMIC) data set was commonly used. Reproducibility was limited as data and code were usually unavailable.

Conclusions: Flexible ML methods are popular for the prediction of AKI, although more complex models based on deep learning are emerging. Our critical appraisal identified a high risk of bias in most models: Studies should use calibration measures and external validation more often, improve model interpretability, and share data and code to improve reproducibility.

Keywords: acute kidney injury; clinical prediction models; critical appraisal; machine learning; systematic review.