Statistical analysis of blood characteristics of COVID-19 patients and their survival or death prediction using machine learning algorithms

Neural Comput Appl. 2022;34(17):14729-14743. doi: 10.1007/s00521-022-07325-y. Epub 2022 May 11.

Abstract

This study's main purpose is to provide helpful information using blood samples from COVID-19 patients as a non-medical approach for helping healthcare systems during the pandemic. Also, this paper aims to evaluate machine learning algorithms for predicting the survival or death of COVID-19 patients. We use a blood sample dataset of 306 infected patients in Wuhan, China, compiled by Tangji Hospital. The dataset consists of blood's clinical indicators and information about whether patients are recovering or not. The used methods include K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), support vector machine (SVM), random forest (RF), stochastic gradient descent (SGD), bagging classifier (BC), and adaptive boosting (AdaBoost). We compare the performance of machine learning algorithms using statistical hypothesis testing. The results show that the most critical feature is age, and there is a high correlation between LD and CRP, and leukocytes and CRP. Furthermore, RF, SVM, DT, AdaBoost, DT, and KNN outperform other machine learning algorithms in predicting the survival or death of COVID-19 patients.

Keywords: Blood sample; COVID-19; Healthcare system; Machine learning; Statistical analysis.