Machine Learning Models to Predict Kidney Stone Recurrence Using 24 Hour Urine Testing and Electronic Health Record-Derived Features

Res Sq [Preprint]. 2023 Jun 29:rs.3.rs-3107998. doi: 10.21203/rs.3.rs-3107998/v1.

Abstract

Objective: To assess the accuracy of machine learning models in predicting kidney stone recurrence using variables extracted from the electronic health record (EHR).

Methods: We trained three separate machine learning (ML) models (least absolute shrinkage and selection operator regression [LASSO], random forest [RF], and gradient boosted decision tree [XGBoost] to predict 2-year and 5-year symptomatic kidney stone recurrence from electronic health-record (EHR) derived features and 24H urine data (n = 1231). ML models were compared to logistic regression [LR]. A manual, retrospective review was performed to evaluate for a symptomatic stone event, defined as pain, acute kidney injury or recurrent infections attributed to a kidney stone identified in the clinic or the emergency department, or for any stone requiring surgical treatment. We evaluated performance using area under the receiver operating curve (AUC-ROC) and identified important features for each model.

Results: The 2- and 5- year symptomatic stone recurrence rates were 25% and 31%, respectively. The LASSO model performed best for symptomatic stone recurrence prediction (2-yr AUC: 0.62, 5-yr AUC: 0.63). Other models demonstrated modest overall performance at 2- and 5-years: LR (0.585, 0.618), RF (0.570, 0.608), and XGBoost (0.580, 0.621). Patient age was the only feature in the top 5 features of every model. Additionally, the LASSO model prioritized BMI and history of gout for prediction.

Conclusions: Throughout our cohorts, ML models demonstrated comparable results to that of LR, with the LASSO model outperforming all other models. Further model testing should evaluate the utility of 24H urine features in model structure.

Keywords: machine learning; recurrence; stone; urolithiasis.

Publication types

  • Preprint