Machine Learning Predictive Modeling for the Identification of Moderate Coronavirus Disease 2019 During the Pandemic: A Retrospective Study

Tao Wang; Zhanqing Zhao; Wenzhe Li; Jing Wu; Qianru Ye; Hui Xie

doi:10.7759/cureus.50619

Machine Learning Predictive Modeling for the Identification of Moderate Coronavirus Disease 2019 During the Pandemic: A Retrospective Study

Cureus. 2023 Dec 16;15(12):e50619. doi: 10.7759/cureus.50619. eCollection 2023 Dec.

Authors

Tao Wang¹, Zhanqing Zhao², Wenzhe Li³, Jing Wu³, Qianru Ye³, Hui Xie¹

Affiliations

¹ Department of Critical Care Medicine, Shanghai General Hospital, Shanghai, CHN.
² Department of Critical Care Medicine, Hainan Western Central Hospital, Danzhou, CHN.
³ Department of Critical Care Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, CHN.

Abstract

Background: Timely differentiation of moderate COVID-19 cases from mild cases is beneficial for early treatment and saves medical resources during the pandemic. We attempted to construct a model to predict the occurrence of moderate COVID-19 through a retrospective study.

Methods: In this retrospective study, clinical data from patients with COVID-19 admitted to Hainan Western Central Hospital in Danzhou, China, between August 1, 2022, and August 31, 2022, was collected, including sex, age, signs on admission, comorbidities, imaging data, post-admission treatment, length of stay, and the results of laboratory tests on admission. The patients were classified into a mild-to-moderate-type group according to WHO guidance. Factors that differed between groups were included in machine learning models such as Bernoulli Naïve Bayes (BNB), linear discriminant analysis, support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), and logistic regression (LR) models. These models were compared to select the optimal model with the best predictive efficacy for moderate COVID-19. The predictive performance of the models was assessed using the area under the curve (AUC), sensitivity, specificity, and calibration plot.

Results: A total of 231 patients with COVID-19 were included in this retrospective analysis. Among them, 152 (68.83%) were mild types, 72 (31.17%) were moderate types, and there were no patients with severe or critical types. A logistic regression model combined with age, respiratory rate (RR), lactate dehydrogenase (LDH), D-dimer, and albumin was selected to predict the occurrence of moderate COVID-19. The receiver operating characteristic curve (ROC) showed that AUC, sensitivity, and specificity in the model were 0.719, 0.681, and 0.635, respectively, in predicting moderate COVID-19. Calibration curve analysis revealed that the predicted probability of the model was in good agreement with the true probability. Stratified analysis showed better predictive efficacy after modeling for people aged ≤66 years (AUC = 0.7656) and a better calibration curve.

Conclusion: The LR model, combined with age, RR, D-dimer, LDH, and albumin, can predict the occurrence of moderate COVID-19 well, especially for patients aged ≤66 years.

Keywords: machine learning; moderate covid-19; pandemic; predictive model; retrospective study.