New insights into the spatial distribution of particle number concentrations by applying non-parametric land use regression modelling

Sci Total Environ. 2020 Feb 1:702:134708. doi: 10.1016/j.scitotenv.2019.134708. Epub 2019 Nov 2.

Abstract

Ambient particle number concentration (PNC) varies significantly in time and space within cities, yet complexity and cost prohibit large-scale routine monitoring; as a consequence, there is not enough data for assessment of human exposure to, or risk from the particles. The quality of assessments can be augmented by modelling; however, models are generally less capable of predicting PNC spatial variation than predicting variations in other ambient pollutants. To advance modelling of PNC, we aimed to develop and compare the performance of parametric and non-parametric machine learning land-use regression (LUR) models to predict hourly average PNC. We used data from 25 short-term stationary campaigns and five long-term sites during 2009-2012 in the Brisbane Metropolitan Area, Australia. We analysed three particle size ranges of total PNC (<30 nm, <414 nm and <3000 nm) as response variables, and over 150 independent variables, including land use, roads and traffic, population, distance, elevation, meteorology and time of day as potential predictors of PNC. The LUR models were developed separately for All Days, Nuc Days (when particle nucleation occurred), and No-nuc Days (when no particle nucleation occurred). We selected two algorithms to develop LUR models for PNC: a random forest (RF) model, and a generalised additive model (GAM) based on the least angle regression (LARS). The best LARS model for <30 nm, <414 nm and <3000 nm explained 30%, 31%, and 34%, respectively, whereas the best RF models were significantly better, explaining 73%, 64%, and 88%, respectively. Using this novel approach, we provided new insights into spatial variation in PNC and also demonstrated that the non-parametric RF model is a better choice for developing a LUR model for PNCs because of its robust predictive performance in comparison with the LARS parametric regression model.

Keywords: Air pollution; Land use regression; Machine learning; Particle number concentration; Random forest; Urban Area.