Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Biomed Signal Process Control. 2023 Jul:84:104818. doi: 10.1016/j.bspc.2023.104818. Epub 2023 Mar 9.

Abstract

COVID-19 is a global threat for the healthcare systems due to the rapid spread of the pathogen that causes it. In such situation, the clinicians must take important decisions, in an environment where medical resources can be insufficient. In this task, the computer-aided diagnosis systems can be very useful not only in the task of supporting the clinical decisions but also to perform relevant analyses, allowing them to understand better the disease and the factors that can identify the high risk patients. For those purposes, in this work, we use several machine learning algorithms to estimate the outcome of COVID-19 patients given their clinical information. Particularly, we perform 2 different studies: the first one estimates whether the patient is at low or at high risk of death whereas the second estimates if the patient needs hospitalization or not. The results of the analyses of this work show the most relevant features for each studied scenario, as well as the classification performance of the considered machine learning models. In particular, the XGBoost algorithm is able to estimate the need for hospitalization of a patient with an AUC-ROC of 0 . 8415 ± 0 . 0217 while it can also estimate the risk of death with an AUC-ROC of 0 . 7992 ± 0 . 0104 . Results have demonstrated the great potential of the proposal to determine those patients that need a greater amount of medical resources for being at a higher risk. This provides the healthcare services with a tool to better manage their resources.

Keywords: COVID-19; Classification; Clinical data; Feature selection; Machine learning.