A Risk-Stratification Machine Learning Framework for the Prediction of Coronary Artery Disease Severity: Insights From the GESS Trial

Front Cardiovasc Med. 2022 Jan 18:8:812182. doi: 10.3389/fcvm.2021.812182. eCollection 2021.

Abstract

Our study aims to develop a data-driven framework utilizing heterogenous electronic medical and clinical records and advanced Machine Learning (ML) approaches for: (i) the identification of critical risk factors affecting the complexity of Coronary Artery Disease (CAD), as assessed via the SYNTAX score; and (ii) the development of ML prediction models for accurate estimation of the expected SYNTAX score. We propose a two-part modeling technique separating the process into two distinct phases: (a) a binary classification task for predicting, whether a patient is more likely to present with a non-zero SYNTAX score; and (b) a regression task to predict the expected SYNTAX score accountable to individual patients with a non-zero SYNTAX score. The framework is based on data collected from the GESS trial (NCT03150680) comprising electronic medical and clinical records for 303 adult patients with suspected CAD, having undergone invasive coronary angiography in AHEPA University Hospital of Thessaloniki, Greece. The deployment of the proposed approach demonstrated that atherogenic index of plasma levels, diabetes mellitus and hypertension can be considered as important risk factors for discriminating patients into zero- and non-zero SYNTAX score groups, whereas diastolic and systolic arterial blood pressure, peripheral vascular disease and body mass index can be considered as significant risk factors for providing an accurate estimation of the expected SYNTAX score, given that a patient belongs to the non-zero SYNTAX score group. The experimental findings utilizing the identified set of important risk factors indicate a sufficient prediction performance for the Support Vector Machine model (classification task) with an F-measure score of ~0.71 and the Support Vector Regression model (regression task) with a median absolute error value of ~6.5. The proposed data-driven framework described herein present evidence of the prediction capacity and the potential clinical usefulness of the developed risk-stratification models. However, further experimentation in a larger clinical setting is needed to ensure the practical utility of the presented models in a way to contribute to a more personalized management and counseling of CAD patients.

Keywords: SYNTAX score; coronary artery disease; machine learning; personalized (precision) medicine; risk-stratification model.