A Novel AUC Maximization Imbalanced Learning Approach for Predicting Composite Outcomes in COVID-19 Hospitalized Patients

Guanjin Wang; Stephen Wai Hang Kwok; Mohammed Yousufuddin; Ferdous Sohel

doi:10.1109/JBHI.2023.3279824

A Novel AUC Maximization Imbalanced Learning Approach for Predicting Composite Outcomes in COVID-19 Hospitalized Patients

IEEE J Biomed Health Inform. 2023 Aug;27(8):3794-3805. doi: 10.1109/JBHI.2023.3279824. Epub 2023 Aug 7.

Authors

Guanjin Wang, Stephen Wai Hang Kwok, Mohammed Yousufuddin, Ferdous Sohel

PMID: 37227914
DOI: 10.1109/JBHI.2023.3279824

Abstract

The COVID-19 patient data for composite outcome prediction often comes with class imbalance issues, i.e., only a small group of patients develop severe composite events after hospital admission, while the rest do not. An ideal COVID-19 composite outcome prediction model should possess strong imbalanced learning capability. The model also should have fewer tuning hyperparameters to ensure good usability and exhibit potential for fast incremental learning. Towards this goal, this study proposes a novel imbalanced learning approach called Imbalanced maximizing-Area Under the Curve (AUC) Proximal Support Vector Machine (ImAUC-PSVM) by the means of classical PSVM to predict the composite outcomes of hospitalized COVID-19 patients within 30 days of hospitalization. ImAUC-PSVM offers the following merits: (1) it incorporates straightforward AUC maximization into the objective function, resulting in fewer parameters to tune. This makes it suitable for handling imbalanced COVID-19 data with a simplified training process. (2) Theoretical derivations reveal that ImAUC-PSVM has the same analytical solution form as PSVM, thus inheriting the advantages of PSVM for handling incremental COVID-19 cases through fast incremental updating. We built and internally and externally validated our proposed classifier using real COVID-19 patient data obtained from three separate sites of Mayo Clinic in the United States. Additionally, we validated it on public datasets using various performance metrics. Experimental results demonstrate that ImAUC-PSVM outperforms other methods in most cases, showcasing its potential to assist clinicians in triaging COVID-19 patients at an early stage in hospital settings, as well as in other prediction applications.

MeSH terms

Area Under Curve
COVID-19*
Hospitalization
Humans
Machine Learning
Prognosis