Data processing pipeline for cardiogenic shock prediction using machine learning

Nikola Jajcay; Branislav Bezak; Amitai Segev; Shlomi Matetzky; Jana Jankova; Michael Spartalis; Mohammad El Tahlawi; Federico Guerra; Julian Friebel; Tharusan Thevathasan; Imrich Berta; Leo Pölzl; Felix Nägele; Edita Pogran; F Aaysha Cader; Milana Jarakovic; Can Gollmann-Tepeköylü; Marta Kollarova; Katarina Petrikova; Otilia Tica; Konstantin A Krychtiuk; Guido Tavazzi; Carsten Skurk; Kurt Huber; Allan Böhm

doi:10.3389/fcvm.2023.1132680

Data processing pipeline for cardiogenic shock prediction using machine learning

Front Cardiovasc Med. 2023 Mar 23:10:1132680. doi: 10.3389/fcvm.2023.1132680. eCollection 2023.

Authors

Nikola Jajcay^{1

2}, Branislav Bezak^{1

3

4}, Amitai Segev^{5

6}, Shlomi Matetzky^{5

6}, Jana Jankova¹, Michael Spartalis^{7

8}, Mohammad El Tahlawi⁹, Federico Guerra¹⁰, Julian Friebel¹¹, Tharusan Thevathasan^{11

12

13

14}, Imrich Berta¹, Leo Pölzl¹⁵, Felix Nägele¹⁵, Edita Pogran¹⁶, F Aaysha Cader¹⁷, Milana Jarakovic^{18

19}, Can Gollmann-Tepeköylü¹⁵, Marta Kollarova¹, Katarina Petrikova¹, Otilia Tica^{20

21}, Konstantin A Krychtiuk^{22

23}, Guido Tavazzi^{24

25}, Carsten Skurk^{11

13}, Kurt Huber¹⁶, Allan Böhm^{1

4

26}

Affiliations

¹ Premedix Academy, Bratislava, Slovakia.
² Department of Complex Systems, Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic.
³ Clinic of Cardiac Surgery, National Institute of Cardiovascular Diseases, Bratislava, Slovakia.
⁴ Faculty of Medicine, Comenius University in Bratislava, Bratislava, Slovakia.
⁵ The Leviev Cardiothoracic & Vascular Center, Chaim Sheba Medical Center, Ramat Gan, Israel.
⁶ Affiliated to the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁷ 3rd Department of Cardiology, National and Kapodistrian University of Athens, Athens, Greece.
⁸ Global Clinical Scholars Research Training (GCSRT) Program, Harvard Medical School, Boston, MA, United States.
⁹ Department of Cardiology, Faculty of Human Medicine, Zagazig University, Zagazig, Egypt.
¹⁰ Cardiology and Arrhythmology Clinic, Marche Polytechnic University, University Hospital "Umberto I - Lancisi - Salesi", Ancona, Italy.
¹¹ Department of Cardiology Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Campus Benjamin Franklin, Charité - Universitätsmedizin Berlin, Berlin, Germany.
¹² Berlin Institute of Health, Charité-Universitätsmedizin Berlin, Berlin, Germany.
¹³ Deutsches Zentrum für Herz-Kreislauf-Forschung e.V., Berlin, Germany.
¹⁴ Institute of Medical Informatics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
¹⁵ Department for Cardiac Surgery, Cardiac Regeneration Research, Medical University of Innsbruck, Innsbruck, Austria.
¹⁶ 3rd Medical Department, Cardiology and Intensive Care Medicine, Wilhelminen Hospital, Vienna, Austria.
¹⁷ Department of Cardiology, Ibrahim Cardiac Hospital & Research Institute, Dhaka, Bangladesh.
¹⁸ Cardiac Intensive Care Unit, Institute for Cardiovascular Diseases of Vojvodina, Sremska Kamenica, Serbia.
¹⁹ Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia.
²⁰ Cardiology Department, Emergency County Clinical Hospital of Oradea, Oradea, Romania.
²¹ Institute of Cardiovascular Sciences, University of Birmingham, Medical School, Birmingham, United Kingdom.
²² Department of Internal Medicine II, Division of Cardiology, Medical University of Vienna, Vienna, Austria.
²³ Duke Clinical Research Institute Durham, NC, United States.
²⁴ Department of Clinical-Surgical, Diagnostic and Paediatric Sciences, University of Pavia, Pavia, Italy.
²⁵ Anesthesia and Intensive Care, Fondazione Policlinico San Matteo Hospital IRCCS, Pavia, Italy.
²⁶ Department of Acute Cardiology, National Institute of Cardiovascular Diseases, Bratislava, Slovakia.

Abstract

Introduction: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS.

Methods: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction.

Results: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization.

Conclusion: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.

Keywords: cardiogenic shock; classification; machine learning; missing data imputation; prediction model; processing pipeline.

© 2023 Jajcay, Bezak, Segev, Matetzky, Jankova, Spartalis, El Tahlawi, Guerra, Friebel, Thevathasan, Berta, Pölzl, Nägele, Pogran, Cader, Jarakovic, Gollmann-Tepeköylü, Kollarova, Petrikova, Tica, Krychtiuk, Tavazzi, Skurk, Huber and Böhm.

Grants and funding

This research was partially supported by the Scientific Grant Agency of the Ministry of Education, Science, Research and Sport of the Slovak Republic grant (VEGA 1/0563/21).