A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction

Diagnostics (Basel). 2022 Jun 30;12(7):1604. doi: 10.3390/diagnostics12071604.

Abstract

The increase in coronavirus disease 2019 (COVID-19) infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed pressure on healthcare services worldwide. Therefore, it is crucial to identify critical factors for the assessment of the severity of COVID-19 infection and the optimization of an individual treatment strategy. In this regard, the present study leverages a dataset of blood samples from 485 COVID-19 individuals in the region of Wuhan, China to identify essential blood biomarkers that predict the mortality of COVID-19 individuals. For this purpose, a hybrid of filter, statistical, and heuristic-based feature selection approach was used to select the best subset of informative features. As a result, minimum redundancy maximum relevance (mRMR), a two-tailed unpaired t-test, and whale optimization algorithm (WOA) were eventually selected as the three most informative blood biomarkers: International normalized ratio (INR), platelet large cell ratio (P-LCR), and D-dimer. In addition, various machine learning (ML) algorithms (random forest (RF), support vector machine (SVM), extreme gradient boosting (EGB), naïve Bayes (NB), logistic regression (LR), and k-nearest neighbor (KNN)) were trained. The performance of the trained models was compared to determine the model that assist in predicting the mortality of COVID-19 individuals with higher accuracy, F1 score, and area under the curve (AUC) values. In this paper, the best performing RF-based model built using the three most informative blood parameters predicts the mortality of COVID-19 individuals with an accuracy of 0.96 ± 0.062, F1 score of 0.96 ± 0.099, and AUC value of 0.98 ± 0.024, respectively on the independent test data. Furthermore, the performance of our proposed RF-based model in terms of accuracy, F1 score, and AUC was significantly better than the known blood biomarkers-based ML models built using the Pre_Surv_COVID_19 data. Therefore, the present study provides a novel hybrid approach to screen the most informative blood biomarkers to develop an RF-based model, which accurately and reliably predicts in-hospital mortality of confirmed COVID-19 individuals, during surge periods. An application based on our proposed model was implemented and deployed at Heroku.

Keywords: COVID-19; blood biomarkers; filter-based feature selection; hybrid-feature selection; machine learning models; meta-heuristic method; mortality risk prediction; two-tailed unpaired t-test.