Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study

JMIR Med Inform. 2020 Oct 29;8(10):e20578. doi: 10.2196/20578.

Abstract

Background: The first-year survival rate among patients undergoing hemodialysis remains poor. Current mortality risk scores for patients undergoing hemodialysis employ regression techniques and have limited applicability and robustness.

Objective: We aimed to develop a machine learning model utilizing clinical factors to predict first-year mortality in patients undergoing hemodialysis that could assist physicians in classifying high-risk patients.

Methods: Training and testing cohorts consisted of 5351 patients from a single center and 5828 patients from 97 renal centers undergoing hemodialysis (incident only). The outcome was all-cause mortality during the first year of dialysis. Extreme gradient boosting was used for algorithm training and validation. Two models were established based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2), and 10-fold cross-validation was applied to each model. The area under the curve (AUC), sensitivity (recall), specificity, precision, balanced accuracy, and F1 score were used to assess the predictive ability of the models.

Results: In the training and testing cohorts, 585 (10.93%) and 764 (13.11%) patients, respectively, died during the first-year follow-up. Of 42 candidate features, the 15 most important features were selected. The performance of model 1 (AUC 0.83, 95% CI 0.78-0.84) was similar to that of model 2 (AUC 0.85, 95% CI 0.81-0.86).

Conclusions: We developed and validated 2 machine learning models to predict first-year mortality in patients undergoing hemodialysis. Both models could be used to stratify high-risk patients at the early stages of dialysis.

Keywords: XGBoost; hemodialysis; machine learning; prediction model.