Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk

Alexandros C Dimopoulos; Mara Nikolaidou; Francisco Félix Caballero; Worrawat Engchuan; Albert Sanchez-Niubo; Holger Arndt; José Luis Ayuso-Mateos; Josep Maria Haro; Somnath Chatterji; Ekavi N Georgousopoulou; Christos Pitsavos; Demosthenes B Panagiotakos

doi:10.1186/s12874-018-0644-1

Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk

BMC Med Res Methodol. 2018 Dec 29;18(1):179. doi: 10.1186/s12874-018-0644-1.

Authors

Alexandros C Dimopoulos^{1

2}, Mara Nikolaidou², Francisco Félix Caballero^{3

4}, Worrawat Engchuan⁵, Albert Sanchez-Niubo^{6

7}, Holger Arndt⁸, José Luis Ayuso-Mateos^{3

9}, Josep Maria Haro^{4

6}, Somnath Chatterji¹⁰, Ekavi N Georgousopoulou^{1

11}, Christos Pitsavos¹², Demosthenes B Panagiotakos^{13

14}

Affiliations

¹ Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens, Greece.
² Department of Informatics & Telematics, School of Digital Technology, Harokopio University, Athens, Greece.
³ Department of Preventive Medicine and Public Health, Universidad Autónoma de Madrid, Madrid, Spain.
⁴ CIBER of Epidemiology and Public Health, Madrid, Spain.
⁵ The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
⁶ Parc Sanitari Sant Joan de Déu, Barcelona, Spain.
⁷ CIBER of Mental Health, Madrid, Spain.
⁸ SPRING TECHNO GMBH & Co. KG, Bremen, Germany.
⁹ Hospital Universitario de La Princesa, Instituto de Investigación Sanitaria Princesa (IP), Madrid, Spain.
¹⁰ Health Metrics and Measurement, World Health Organization, Geneva, Switzerland.
¹¹ Faculty of Health, University of Canberra, Canberra, ACT, Australia.
¹² School of Medicine, University of Athens, Athens, Greece.
¹³ Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens, Greece. d.b.panagiotakos@usa.net.
¹⁴ Faculty of Health, University of Canberra, Canberra, ACT, Australia. d.b.panagiotakos@usa.net.

Abstract

Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.

Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.

Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results.

Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.

Keywords: Cardiovascular disease; Machine learning; Model performance; Risk prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Algorithms*
Blood Pressure / physiology
Cardiovascular Diseases / diagnosis*
Cardiovascular Diseases / physiopathology
Female
Humans
Machine Learning*
Male
Middle Aged
Models, Cardiovascular*
Prospective Studies
Reproducibility of Results
Risk Assessment / methods
Risk Assessment / statistics & numerical data*
Risk Factors
Sensitivity and Specificity

Grants and funding

001/WHO_/World Health Organization/International