What drives performance in machine learning models for predicting heart failure outcome?

Rom Gutman; Doron Aronson; Oren Caspi; Uri Shalit

doi:10.1093/ehjdh/ztac054

What drives performance in machine learning models for predicting heart failure outcome?

Eur Heart J Digit Health. 2022 Sep 30;4(3):175-187. doi: 10.1093/ehjdh/ztac054. eCollection 2023 May.

Authors

Rom Gutman¹, Doron Aronson^{2

3}, Oren Caspi^{2

3}, Uri Shalit¹

Affiliations

¹ William Davidson Faculty of Industrial Engineering and Management, Technion, Haifa, Israel.
² Department of Cardiology, Rambam Health Care Campus.
³ the Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel.

Abstract

Aims: The development of acute heart failure (AHF) is a critical decision point in the natural history of the disease and carries a dismal prognosis. The lack of appropriate risk-stratification tools at hospital discharge of AHF patients significantly limits clinical ability to precisely tailor patient-specific therapeutic regimen at this pivotal juncture. Machine learning-based strategies may improve risk stratification by incorporating analysis of high-dimensional patient data with multiple covariates and novel prediction methodologies. In the current study, we aimed at evaluating the drivers for success in prediction models and establishing an institute-tailored artificial Intelligence-based prediction model for real-time decision support.

Methods and results: We used a cohort of all 10 868 patients AHF patients admitted to a tertiary hospital during a 12 years period. A total of 372 covariates were collected from admission to the end of the hospitalization. We assessed model performance across two axes: (i) type of prediction method and (ii) type and number of covariates. The primary outcome was 1-year survival from hospital discharge. For the model-type axis, we experimented with seven different methods: logistic regression (LR) with either L₁ or L₂ regularization, random forest (RF), Cox proportional hazards model (Cox), extreme gradient boosting (XGBoost), a deep neural-net (NeuralNet) and an ensemble classifier of all the above methods. We were able to achieve an area under receiver operator curve (AUROC) prediction accuracy of more than 80% with most prediction models including L1/L2-LR (80.4%/80.3%), Cox (80.2%), XGBoost (80.5%), NeuralNet (80.4%). RF was inferior to other methods (78.8%), and the ensemble model was slightly superior (81.2%). The number of covariates was a significant modifier (P < 0.001) of prediction success, the use of multiplex-covariates preformed significantly better (AUROC 80.4% for L1-LR) compared with a set of known clinical covariates (AUROC 77.8%). Demographics followed by lab-tests and administrative data resulted in the largest gain in model performance.

Conclusions: The choice of the predictive modelling method is secondary to the multiplicity and type of covariates for predicting AHF prognosis. The application of a structured data pre-processing combined with the use of multiple-covariates results in an accurate, institute-tailored risk prediction in AHF.