Comparisons among Machine Learning Models for the Prediction of Hypercholestrolemia Associated with Exposure to Lead, Mercury, and Cadmium

Int J Environ Res Public Health. 2019 Jul 25;16(15):2666. doi: 10.3390/ijerph16152666.

Abstract

Lead, mercury, and cadmium are common environmental pollutants in industrialized countries, but their combined impact on hypercholesterolemia (HC) is poorly understood. The aim of this study was to compare the performance of various machine learning (ML) models to predict the prevalence of HC associated with exposure to lead, mercury, and cadmium. A total of 10,089 participants of the Korea National Health and Nutrition Examination Surveys 2008-2013 were selected and their demographic characteristics, blood concentration of metals, and total cholesterol levels were collected for analysis. For prediction, five ML models, including logistic regression (LR), k-nearest neighbors, decision trees, random forests, and support vector machines (SVM) were constructed and their predictive performances were compared. Of the five ML models, the SVM model was the most accurate and the LR model had the highest area under receiver operating characteristic (ROC) curve of 0.718 (95% CI: 0.688-0.748). This study shows the potential of various ML methods to predict HC associated with exposure to metals using population-based survey data.

Keywords: cholesterol; heavy metals; machine learning; predictive model.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Cadmium / toxicity*
  • Decision Trees
  • Environmental Exposure / statistics & numerical data*
  • Environmental Pollutants / toxicity*
  • Female
  • Forecasting
  • Humans
  • Hypercholesterolemia / etiology*
  • Lead / toxicity*
  • Logistic Models
  • Machine Learning*
  • Male
  • Mercury / toxicity*
  • Middle Aged
  • ROC Curve
  • Republic of Korea / epidemiology
  • Support Vector Machine

Substances

  • Environmental Pollutants
  • Cadmium
  • Lead
  • Mercury