The roles of predictors in cardiovascular risk models - a question of modeling culture?

BMC Med Res Methodol. 2021 Dec 18;21(1):284. doi: 10.1186/s12874-021-01487-4.

Abstract

Background: While machine learning (ML) algorithms may predict cardiovascular outcomes more accurately than statistical models, their result is usually not representable by a transparent formula. Hence, it is often unclear how specific values of predictors lead to the predictions. We aimed to demonstrate with graphical tools how predictor-risk relations in cardiovascular risk prediction models fitted by ML algorithms and by statistical approaches may differ, and how sample size affects the stability of the estimated relations.

Methods: We reanalyzed data from a large registry of 1.5 million participants in a national health screening program. Three data analysts developed analytical strategies to predict cardiovascular events within 1 year from health screening. This was done for the full data set and with gradually reduced sample sizes, and each data analyst followed their favorite modeling approach. Predictor-risk relations were visualized by partial dependence and individual conditional expectation plots.

Results: When comparing the modeling algorithms, we found some similarities between these visualizations but also occasional divergence. The smaller the sample size, the more the predictor-risk relation depended on the modeling algorithm used, and also sampling variability played an increased role. Predictive performance was similar if the models were derived on the full data set, whereas smaller sample sizes favored simpler models.

Conclusion: Predictor-risk relations from ML models may differ from those obtained by statistical models, even with large sample sizes. Hence, predictors may assume different roles in risk prediction models. As long as sample size is sufficient, predictive accuracy is not largely affected by the choice of algorithm.

Keywords: Cardiovascular risk; Non-linear effect; Partial dependence plots; Prediction model; Predictors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cardiovascular Diseases* / diagnosis
  • Cardiovascular Diseases* / epidemiology
  • Heart Disease Risk Factors
  • Humans
  • Machine Learning
  • Models, Statistical
  • Risk Factors