Machine learning approaches to characterize the obesogenic urban exposome

Environ Int. 2022 Jan:158:107015. doi: 10.1016/j.envint.2021.107015. Epub 2021 Dec 3.

Abstract

Background: Characteristics of the urban environment may contain upstream drivers of obesity. However, research is lacking that considers the combination of environmental factors simultaneously.

Objectives: We aimed to explore what environmental factors of the urban exposome are related to body mass index (BMI), and evaluated the consistency of findings across multiple statistical approaches.

Methods: A cross-sectional analysis was conducted using baseline data from 14,829 participants of the Occupational and Environmental Health Cohort study. BMI was obtained from self-reported height and weight. Geocoded exposures linked to individual home addresses (using 6-digit postcode) of 86 environmental factors were estimated, including air pollution, traffic noise, green-space, built environmental and neighborhood socio-demographic characteristics. Exposure-obesity associations were identified using the following approaches: sparse group Partial Least Squares, Bayesian Model Averaging, penalized regression using the Minimax Concave Penalty, Generalized Additive Model-based boosting Random Forest, Extreme Gradient Boosting, and Multiple Linear Regression, as the most conventional approach. The models were adjusted for individual socio-demographic variables. Environmental factors were ranked according to variable importance scores attributed by each approach and median ranks were calculated across these scores to identify the most consistent associations.

Results: The most consistent environmental factors associated with BMI were the average neighborhood value of the homes, oxidative potential of particulate matter air pollution (OP), healthy food outlets in the neighborhood (5 km buffer), low-income neighborhoods, and one-person households in the neighborhood. Higher BMI levels were observed in low-income neighborhoods, with lower average house values, lower share of one-person households and smaller amount of healthy food retailers. Higher BMI levels were observed in low-income neighborhoods, with lower average house values, lower share of one-person households, smaller amounts of healthy food retailers and higher OP levels. Across the approaches, we observed consistent patterns of results based on model's capacity to incorporate linear or nonlinear associations.

Discussion: The pluralistic analysis on environmental obesogens strengthens the existing evidence on the role of neighborhood socioeconomic position, urbanicity and air pollution.

Keywords: Air pollution; Exposome; Extreme gradient boosting (XGBoost); Random forest; Shapley values; Socioeconomic position.

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Bayes Theorem
  • Cohort Studies
  • Cross-Sectional Studies
  • Environmental Exposure / analysis
  • Exposome*
  • Humans
  • Machine Learning
  • Residence Characteristics

Substances

  • Air Pollutants