Comparison of machine learning methods with logistic regression analysis in creating predictive models for risk of critical in-hospital events in COVID-19 patients on hospital admission

Aaron W Sievering; Peter Wohlmuth; Nele Geßler; Melanie A Gunawardene; Klaus Herrlinger; Berthold Bein; Dirk Arnold; Martin Bergmann; Lorenz Nowak; Christian Gloeckner; Ina Koch; Martin Bachmann; Christoph U Herborn; Axel Stang

doi:10.1186/s12911-022-02057-4

Comparison of machine learning methods with logistic regression analysis in creating predictive models for risk of critical in-hospital events in COVID-19 patients on hospital admission

BMC Med Inform Decis Mak. 2022 Nov 28;22(1):309. doi: 10.1186/s12911-022-02057-4.

Authors

Aaron W Sievering¹, Peter Wohlmuth^{1

2}, Nele Geßler^{1

2

3}, Melanie A Gunawardene³, Klaus Herrlinger^{4

5}, Berthold Bein⁶, Dirk Arnold^{5

7}, Martin Bergmann⁸, Lorenz Nowak⁹, Christian Gloeckner¹⁰, Ina Koch¹¹, Martin Bachmann¹², Christoph U Herborn^{1

13}, Axel Stang^{14

15

16}

Affiliations

¹ Semmelweis University, Asklepios Campus Hamburg, Budapest, Hungary.
² Asklepios Proresearch, Research Institute, Hamburg, Germany.
³ Department of Cardiology and Intensive Care Medicine, Asklepios Hospital St. Georg, Hamburg, Germany.
⁴ Department of Internal Medicine, Asklepios Hospital Nord-Heidberg, Hamburg, Germany.
⁵ Asklepios Tumorzentrum, Hamburg, Germany.
⁶ Department of Anesthesiology and Intensive Care Medicine, Asklepios Hospital St. Georg, Hamburg, Germany.
⁷ Department of Hematology, Oncology, Palliative Care and Rheumatology, Asklepios Hospital Altona, Hamburg, Germany.
⁸ Department of Internal Medicine, Cardiology, and Pneumology, Asklepios Hospital Wandsbek, Hamburg, Germany.
⁹ Department of Intensive Care and Ventilation Medicine, Asklepios Hospital München-Gauting, Gauting, Germany.
¹⁰ Department of Internal Medicine, Asklepios Hospital Oberviechtach, Oberviechtach, Germany.
¹¹ Biobank for Pulmonary Diseases, Asklepios Hospital München-Gauting, Gauting, Germany.
¹² Department of Intensive Care and Ventilatory Medicine, Asklepios Hospital Harburg, Hamburg, Germany.
¹³ Asklepios Hospitals GmbH & Co. KGaA, Hamburg, Germany.
¹⁴ Semmelweis University, Asklepios Campus Hamburg, Budapest, Hungary. a.stang@asklepios.com.
¹⁵ Asklepios Tumorzentrum, Hamburg, Germany. a.stang@asklepios.com.
¹⁶ Department of Hematology, Oncology and Palliative Care Medicine, Asklepios Hospital Barmbek, Rübenkamp 220, 22291, Hamburg, Germany. a.stang@asklepios.com.

Abstract

Background: Machine learning (ML) algorithms have been trained to early predict critical in-hospital events from COVID-19 using patient data at admission, but little is known on how their performance compares with each other and/or with statistical logistic regression (LR). This prospective multicentre cohort study compares the performance of a LR and five ML models on the contribution of influencing predictors and predictor-to-event relationships on prediction model´s performance.

Methods: We used 25 baseline variables of 490 COVID-19 patients admitted to 8 hospitals in Germany (March-November 2020) to develop and validate (75/25 random-split) 3 linear (L1 and L2 penalty, elastic net [EN]) and 2 non-linear (support vector machine [SVM] with radial kernel, random forest [RF]) ML approaches for predicting critical events defined by intensive care unit transfer, invasive ventilation and/or death (composite end-point: 181 patients). Models were compared for performance (area-under-the-receiver-operating characteristic-curve [AUC], Brier score) and predictor importance (performance-loss metrics, partial-dependence profiles).

Results: Models performed close with a small benefit for LR (utilizing restricted cubic splines for non-linearity) and RF (AUC means: 0.763-0.731 [RF-L1]); Brier scores: 0.184-0.197 [LR-L1]). Top ranked predictor variables (consistently highest importance: C-reactive protein) were largely identical across models, except creatinine, which exhibited marginal (L1, L2, EN, SVM) or high/non-linear effects (LR, RF) on events.

Conclusions: Although the LR and ML models analysed showed no strong differences in performance and the most influencing predictors for COVID-19-related event prediction, our results indicate a predictive benefit from taking account for non-linear predictor-to-event relationships and effects. Future efforts should focus on leveraging data-driven ML technologies from static towards dynamic modelling solutions that continuously learn and adapt to changes in data environments during the evolving pandemic.

Trial registration number: NCT04659187.

Keywords: COVID-19; Clinical decision-making; Critical event prediction; Machine learning; Predictive models.

MeSH terms

COVID-19*
Cohort Studies
Hospitals
Humans
Logistic Models
Machine Learning
Prospective Studies

Associated data

ClinicalTrials.gov/NCT04659187