Predicting Low-Level Childhood Lead Exposure in Metro Atlanta Using Ensemble Machine Learning of High-Resolution Raster Cells

Int J Environ Res Public Health. 2023 Mar 2;20(5):4477. doi: 10.3390/ijerph20054477.

Abstract

Low-level lead exposure in children is a major public health issue. Higher-resolution spatial targeting would significantly improve county and state-wide policies and programs for lead exposure prevention that generally intervene across large geographic areas. We use stack-ensemble machine learning, including an elastic net generalized linear model, gradient-boosted machine, and deep neural network, to predict the number of children with venous blood lead levels (BLLs) ≥2 to <5 µg/dL and ≥5 µg/dL in ~1 km2 raster cells in the metro Atlanta region using a sample of 92,792 children ≤5 years old screened between 2010 and 2018. Permutation-based predictor importance and partial dependence plots were used for interpretation. Maps of predicted vs. observed values were generated to compare model performance. According to the EPA Toxic Release Inventory for air-based toxic release facility density, the percentage of the population below the poverty threshold, crime, and road network density was positively associated with the number of children with low-level lead exposure, whereas the percentage of the white population was inversely associated. While predictions generally matched observed values, cells with high counts of lead exposure were underestimated. High-resolution geographic prediction of lead-exposed children using ensemble machine learning is a promising approach to enhance lead prevention efforts.

Keywords: geographic prediction; lead exposure; machine learning; primary prevention.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Child, Preschool
  • Humans
  • Lead Poisoning* / epidemiology
  • Lead*
  • Linear Models
  • Machine Learning
  • Poverty

Substances

  • Lead