New insights into the spatial distribution of particle number concentrations by applying non-parametric land use regression modelling

Md Mahmudur Rahman; Jayanandana Karunasinghe; Sam Clifford; Luke D Knibbs; Lidia Morawska

doi:10.1016/j.scitotenv.2019.134708

New insights into the spatial distribution of particle number concentrations by applying non-parametric land use regression modelling

Sci Total Environ. 2020 Feb 1:702:134708. doi: 10.1016/j.scitotenv.2019.134708. Epub 2019 Nov 2.

Authors

Md Mahmudur Rahman¹, Jayanandana Karunasinghe², Sam Clifford³, Luke D Knibbs⁴, Lidia Morawska⁵

Affiliations

¹ International Laboratory for Air Quality and Health, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia; Climate and Atmospheric Science, Department of Planning, Industry and Environment, 480 Weeroona Road, Lidcombe, NSW 2141, Australia.
² International Laboratory for Air Quality and Health, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
³ International Laboratory for Air Quality and Health, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia; Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E 7HT, UK.
⁴ School of Public Health, The University of Queensland, Herston, QLD 4006, Australia.
⁵ International Laboratory for Air Quality and Health, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia. Electronic address: l.morawska@qut.edu.au.

PMID: 31715399
DOI: 10.1016/j.scitotenv.2019.134708

Abstract

Ambient particle number concentration (PNC) varies significantly in time and space within cities, yet complexity and cost prohibit large-scale routine monitoring; as a consequence, there is not enough data for assessment of human exposure to, or risk from the particles. The quality of assessments can be augmented by modelling; however, models are generally less capable of predicting PNC spatial variation than predicting variations in other ambient pollutants. To advance modelling of PNC, we aimed to develop and compare the performance of parametric and non-parametric machine learning land-use regression (LUR) models to predict hourly average PNC. We used data from 25 short-term stationary campaigns and five long-term sites during 2009-2012 in the Brisbane Metropolitan Area, Australia. We analysed three particle size ranges of total PNC (<30 nm, <414 nm and <3000 nm) as response variables, and over 150 independent variables, including land use, roads and traffic, population, distance, elevation, meteorology and time of day as potential predictors of PNC. The LUR models were developed separately for All Days, Nuc Days (when particle nucleation occurred), and No-nuc Days (when no particle nucleation occurred). We selected two algorithms to develop LUR models for PNC: a random forest (RF) model, and a generalised additive model (GAM) based on the least angle regression (LARS). The best LARS model for <30 nm, <414 nm and <3000 nm explained 30%, 31%, and 34%, respectively, whereas the best RF models were significantly better, explaining 73%, 64%, and 88%, respectively. Using this novel approach, we provided new insights into spatial variation in PNC and also demonstrated that the non-parametric RF model is a better choice for developing a LUR model for PNCs because of its robust predictive performance in comparison with the LARS parametric regression model.

Keywords: Air pollution; Land use regression; Machine learning; Particle number concentration; Random forest; Urban Area.