The roles of predictors in cardiovascular risk models - a question of modeling culture?

Christine Wallisch; Asan Agibetov; Daniela Dunkler; Maria Haller; Matthias Samwald; Georg Dorffner; Georg Heinze

doi:10.1186/s12874-021-01487-4

The roles of predictors in cardiovascular risk models - a question of modeling culture?

BMC Med Res Methodol. 2021 Dec 18;21(1):284. doi: 10.1186/s12874-021-01487-4.

Authors

Christine Wallisch¹, Asan Agibetov², Daniela Dunkler¹, Maria Haller^{1

3}, Matthias Samwald², Georg Dorffner², Georg Heinze⁴

Affiliations

¹ Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria.
² Section for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.
³ Department of Nephrology, Ordensklinikum Linz, Hospital Elisabethinen, Linz, Austria.
⁴ Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria. georg.heinze@meduniwien.ac.at.

Abstract

Background: While machine learning (ML) algorithms may predict cardiovascular outcomes more accurately than statistical models, their result is usually not representable by a transparent formula. Hence, it is often unclear how specific values of predictors lead to the predictions. We aimed to demonstrate with graphical tools how predictor-risk relations in cardiovascular risk prediction models fitted by ML algorithms and by statistical approaches may differ, and how sample size affects the stability of the estimated relations.

Methods: We reanalyzed data from a large registry of 1.5 million participants in a national health screening program. Three data analysts developed analytical strategies to predict cardiovascular events within 1 year from health screening. This was done for the full data set and with gradually reduced sample sizes, and each data analyst followed their favorite modeling approach. Predictor-risk relations were visualized by partial dependence and individual conditional expectation plots.

Results: When comparing the modeling algorithms, we found some similarities between these visualizations but also occasional divergence. The smaller the sample size, the more the predictor-risk relation depended on the modeling algorithm used, and also sampling variability played an increased role. Predictive performance was similar if the models were derived on the full data set, whereas smaller sample sizes favored simpler models.

Conclusion: Predictor-risk relations from ML models may differ from those obtained by statistical models, even with large sample sizes. Hence, predictors may assume different roles in risk prediction models. As long as sample size is sufficient, predictive accuracy is not largely affected by the choice of algorithm.

Keywords: Cardiovascular risk; Non-linear effect; Partial dependence plots; Prediction model; Predictors.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cardiovascular Diseases* / diagnosis
Cardiovascular Diseases* / epidemiology
Heart Disease Risk Factors
Humans
Machine Learning
Models, Statistical
Risk Factors