A causal learning framework for the analysis and interpretation of COVID-19 clinical data

Elisa Ferrari; Luna Gargani; Greta Barbieri; Lorenzo Ghiadoni; Francesco Faita; Davide Bacciu

doi:10.1371/journal.pone.0268327

A causal learning framework for the analysis and interpretation of COVID-19 clinical data

PLoS One. 2022 May 19;17(5):e0268327. doi: 10.1371/journal.pone.0268327. eCollection 2022.

Authors

Elisa Ferrari¹, Luna Gargani², Greta Barbieri^{3

4}, Lorenzo Ghiadoni⁵, Francesco Faita², Davide Bacciu⁶

Affiliations

¹ Scuola Normale Superiore, Pisa, Italy.
² Institute of Clinical Physiology, C.N.R, Pisa, Italy.
³ Department of Surgical, Medical, Molecular and Critical Area Pathology, University of Pisa, Pisa, Italy.
⁴ Emergency Medicine Department, Pisa University Hospital, Pisa, Italy.
⁵ Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy.
⁶ Department of Computer Science, University of Pisa, Pisa, Italy.

Abstract

We present a workflow for clinical data analysis that relies on Bayesian Structure Learning (BSL), an unsupervised learning approach, robust to noise and biases, that allows to incorporate prior medical knowledge into the learning process and that provides explainable results in the form of a graph showing the causal connections among the analyzed features. The workflow consists in a multi-step approach that goes from identifying the main causes of patient's outcome through BSL, to the realization of a tool suitable for clinical practice, based on a Binary Decision Tree (BDT), to recognize patients at high-risk with information available already at hospital admission time. We evaluate our approach on a feature-rich dataset of Coronavirus disease (COVID-19), showing that the proposed framework provides a schematic overview of the multi-factorial processes that jointly contribute to the outcome. We compare our findings with current literature on COVID-19, showing that this approach allows to re-discover established cause-effect relationships about the disease. Further, our approach yields to a highly interpretable tool correctly predicting the outcome of 85% of subjects based exclusively on 3 features: age, a previous history of chronic obstructive pulmonary disease and the PaO2/FiO2 ratio at the time of arrival to the hospital. The inclusion of additional information from 4 routine blood tests (Creatinine, Glucose, pO2 and Sodium) increases predictive accuracy to 94.5%.

Publication types

Review

MeSH terms

Bayes Theorem
COVID-19*
Causality
Hospitalization
Humans

Grants and funding

The authors received no specific funding for this work.