Comparisons among Machine Learning Models for the Prediction of Hypercholestrolemia Associated with Exposure to Lead, Mercury, and Cadmium

Hyejin Park; Kisok Kim

doi:10.3390/ijerph16152666

Comparisons among Machine Learning Models for the Prediction of Hypercholestrolemia Associated with Exposure to Lead, Mercury, and Cadmium

Int J Environ Res Public Health. 2019 Jul 25;16(15):2666. doi: 10.3390/ijerph16152666.

Authors

Hyejin Park¹, Kisok Kim²

Affiliations

¹ Department of International Healthcare Administration, Daegu Catholic University, Gyeongsan 38430, Korea.
² College of Pharmacy, Keimyung University, Daegu 42601, Korea. kimkisok@kmu.ac.kr.

Abstract

Lead, mercury, and cadmium are common environmental pollutants in industrialized countries, but their combined impact on hypercholesterolemia (HC) is poorly understood. The aim of this study was to compare the performance of various machine learning (ML) models to predict the prevalence of HC associated with exposure to lead, mercury, and cadmium. A total of 10,089 participants of the Korea National Health and Nutrition Examination Surveys 2008-2013 were selected and their demographic characteristics, blood concentration of metals, and total cholesterol levels were collected for analysis. For prediction, five ML models, including logistic regression (LR), k-nearest neighbors, decision trees, random forests, and support vector machines (SVM) were constructed and their predictive performances were compared. Of the five ML models, the SVM model was the most accurate and the LR model had the highest area under receiver operating characteristic (ROC) curve of 0.718 (95% CI: 0.688-0.748). This study shows the potential of various ML methods to predict HC associated with exposure to metals using population-based survey data.

Keywords: cholesterol; heavy metals; machine learning; predictive model.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Cadmium / toxicity*
Decision Trees
Environmental Exposure / statistics & numerical data*
Environmental Pollutants / toxicity*
Female
Forecasting
Humans
Hypercholesterolemia / etiology*
Lead / toxicity*
Logistic Models
Machine Learning*
Male
Mercury / toxicity*
Middle Aged
ROC Curve
Republic of Korea / epidemiology
Support Vector Machine

Substances

Environmental Pollutants
Cadmium
Lead
Mercury