Automated prediction of COVID-19 mortality outcome using clinical and laboratory data based on hierarchical feature selection and random forest classifier

Comput Methods Biomech Biomed Engin. 2023 Feb;26(2):160-173. doi: 10.1080/10255842.2022.2050906. Epub 2022 Mar 17.

Abstract

Early prediction of COVID-19 mortality outcome can decrease expiration risk by alerting healthcare personnel to assure efficient resource allocation and treatment planning. This study introduces a machine learning framework for the prediction of COVID-19 mortality using demographics, vital signs, and laboratory blood tests (complete blood count (CBC), coagulation, kidney, liver, blood gas, and general). 41 features from 244 COVID-19 patients were recorded on the first day of admission. In this study, first, the features in each of the eight categories were investigated. Afterward, features that have an area under the receiver operating characteristic curve (AUC) above 0.6 and the p-value criterion from the Wilcoxon rank-sum test below 0.005 were used as selected features for further analysis. Then five feature reduction methods, Forward Feature selection, minimum Redundancy Maximum Relevance, Relieff, Linear Discriminant Analysis, and Neighborhood Component Analysis were utilized to select the best combination of features. Finally, seven classifiers frameworks, random forest (RF), support vector machine, logistic regression (LR), K nearest neighbors, Artifical neural network, bagging, and boosting were used to predict the mortality outcome of COVID-19 patients. The results revealed that the combination of features in CBC and then vital signs had the highest mortality classification parameters, respectively. Furthermore, the RF classifier with hierarchical feature selection algorithms via Forward Feature selection had the highest classification power with an accuracy of 92.08 ± 2.56. Therefore, our proposed method can be confidently used as a valuable assistant prognostic tool to sieve patients with high mortality risks.

Keywords: COVID-19; forward feature selection; laboratory features; mortality prediction; random forest.

MeSH terms

  • Algorithms
  • COVID-19* / diagnosis
  • Humans
  • Neural Networks, Computer
  • ROC Curve
  • Random Forest