Bayesian variable selection for understanding mixtures in environmental exposures

Stat Med. 2021 Sep 30;40(22):4850-4871. doi: 10.1002/sim.9099. Epub 2021 Jun 15.

Abstract

Social and environmental stressors are crucial factors in child development. However, there exists a multitude of measurable social and environmental factors-the effects of which may be cumulative, interactive, or null. Using a comprehensive cohort of children in North Carolina, we study the impact of social and environmental variables on 4th end-of-grade exam scores in reading and mathematics. To identify the essential factors that predict these educational outcomes, we design new tools for Bayesian linear variable selection using decision analysis. We extract a predictive optimal subset of explanatory variables by coupling a loss function with a novel model-based penalization scheme, which leads to coherent Bayesian decision analysis and empirically improves variable selection, estimation, and prediction on simulated data. The Bayesian linear model propagates uncertainty quantification to all predictive evaluations, which is important for interpretable and robust model comparisons. These predictive comparisons are conducted out-of-sample with a customized approximation algorithm that avoids computationally intensive model refitting. We apply our variable selection techniques to identify the joint collection of social and environmental stressors-and their interactions-that offer clear and quantifiable improvements in prediction of reading and mathematics exam scores.

Keywords: air quality; educational outcomes; lead; prediction; regression.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Child
  • Cohort Studies
  • Environmental Exposure* / adverse effects
  • Humans
  • North Carolina