Finite mixture models to characterize and refine air quality monitoring networks

Sci Total Environ. 2014 Jul 1:485-486:292-299. doi: 10.1016/j.scitotenv.2014.03.091. Epub 2014 Apr 12.

Abstract

Background: Existing air quality monitoring programs are, on occasion, not updated according to local, varying conditions and as such the monitoring programs become non-informative over time, under-detecting new sources of pollutants or duplicating information. Furthermore, inadequate maintenance may cause the monitoring equipment to be utterly deficient in providing information. To deal with these issues, a combination of formal statistical methods is used to optimize resources for monitoring and to characterize the monitoring networks, introducing new criteria for their refinement.

Methods: Monitoring data were obtained on key pollutants such as carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), particulate matter (PM10) and sulfur dioxide (SO2) from 12 air quality monitoring sites in Seville (Spain) during 2012. A total of 49 data sets were fit to mixture models of Gaussian distribution using the expectation-maximization (EM) algorithm. To summarize these 49 models, the mean and coefficient of variation were calculated for each mixture and carried out a hierarchical clustering analysis (HCA) to study the grouping of the sites according to these statistics. To handle the lack of observational data from the sites with unmonitored pollutants, the missing statistical values were imputed by applying the random forests technique and then later, a principal component analysis (PCA) was carried out to better understand the relationship between the level of pollution and the classification of monitoring sites. All of the techniques were applied using free, open-source, statistical software.

Results and conclusion: One example of source attribution and contribution is analyzed using mixture models and the potential for mixture models is posed in characterizing pollution trends. The mixture statistics have proven to be a fingerprint for every model and this work presents a novel use of them and represents a promising approach to characterizing mixture models in the air quality management discipline. The imputation technique used is allowed for estimating the missing information from key unmonitored pollutants to gather information about unknown pollution levels and to suggest new possible monitoring configurations for this network. Posterior PCA confirmed the misclassification of one site detected with HCA. The authors consider the stepwise approach used in this work to be promising and able to be applied to other air monitoring network studies.

Keywords: Air quality monitoring networks; Finite mixture models; Imputation; Missing data; Random forests; Seville.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Air Pollutants / analysis*
  • Air Pollution / statistics & numerical data*
  • Carbon Monoxide / analysis
  • Environmental Monitoring / methods*
  • Models, Chemical*
  • Models, Statistical
  • Nitrogen Dioxide / analysis
  • Ozone / analysis
  • Particulate Matter / analysis
  • Principal Component Analysis
  • Spain
  • Sulfur Dioxide / analysis
  • Vehicle Emissions / analysis

Substances

  • Air Pollutants
  • Particulate Matter
  • Vehicle Emissions
  • Sulfur Dioxide
  • Ozone
  • Carbon Monoxide
  • Nitrogen Dioxide