On the explainability of hospitalization prediction on a large COVID-19 patient dataset

Ivan Girardi; Panagiotis Vagenas; Dario Arcos-D Iaz; Lydia Bessa I; Alexander Bu Sser; Ludovico Furlan; Raffaello Furlan; Mauro Gatti; Andrea Giovannini; Ellen Hoeven; Chiara Marchiori

On the explainability of hospitalization prediction on a large COVID-19 patient dataset

AMIA Annu Symp Proc. 2022 Feb 21:2021:526-535. eCollection 2021.

Affiliations

¹ IBM Research Europe.
² IBM GBS Germany.
³ IBM GBS Switzerland.
⁴ Fondazione IRCCS Ca' Granda, Ospedale Maggiore Policlinico, Milano, Italy.
⁵ Department of Biomedical Sciences, Humanitas University and IRCCS - Humanitas Research Hospital, Milano, Italy.
⁶ IBM GBS Italy.

PMID: 35308959
PMCID: PMC8861733

Abstract

We develop various AI models to predict hospitalization on a large (over 110k) cohort of COVID-19 positive-tested US patients, sourced from March 2020 to February 2021. Models range from Random Forest to Neural Network (NN) and Time Convolutional NN, where combination of the data modalities (tabular and time dependent) are performed at different stages (early vs. model fusion). Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and F1-score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class. Performances do not significantly drop even when selected lists of features are removed to study model adaptability to different scenarios. However, a systematic study of the SHAP feature importance values for the developed models in the different scenarios shows a large variability across models and use cases. This calls for even more complete studies on several explainability methods before their adoption in high-stakes scenarios.

MeSH terms

COVID-19* / epidemiology
Cohort Studies
Hospitalization
Humans
Neural Networks, Computer