Principal Component Pursuit for Pattern Identification in Environmental Mixtures

Environ Health Perspect. 2022 Nov;130(11):117008. doi: 10.1289/EHP10479. Epub 2022 Nov 23.

Abstract

Background: Environmental health researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental exposures.

Objective: We adapted principal component pursuit (PCP)-a robust and well-established technique for dimensionality reduction in computer vision and signal processing-to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent patterns of exposure across pollutants and a sparse matrix isolating unique or extreme exposure events.

Methods: We adapted PCP to accommodate nonnegative data, missing data, and values below a given limit of detection (LOD). We simulated data to represent environmental mixtures of two sizes with increasing proportions <LOD and three noise structures. We applied PCP-LOD to evaluate its performance in comparison with principal component analysis (PCA). We next applied principal component pursuit with limit of detection (PCP-LOD) to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001-2002 National Health and Nutrition Examination Survey (NHANES). We applied singular value decomposition to the estimated low-rank matrix to characterize the patterns.

Results: PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated data sets with up to 50% of the data <LOD. When 75% of values were <LOD, PCP-LOD outperformed PCA only when noise was low. In the POP mixture, PCP-LOD identified a rank-three underlying structure and separated 6% of values as extreme events. One pattern represented comprehensive exposure to all POPs. The other patterns grouped chemicals based on known structure and toxicity.

Discussion: PCP-LOD serves as a useful tool to express multidimensional exposures as consistent patterns that, if found to be related to adverse health, are amenable to targeted public health messaging. https://doi.org/10.1289/EHP10479.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Environmental Exposure / analysis
  • Environmental Pollutants* / toxicity
  • Nutrition Surveys
  • Principal Component Analysis
  • Public Health

Substances

  • Environmental Pollutants