Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters

Hai Tao; Ali H Jawad; A H Shather; Zainab Al-Khafaji; Tarik A Rashid; Mumtaz Ali; Nadhir Al-Ansari; Haydar Abdulameer Marhoon; Shamsuddin Shahid; Zaher Mundher Yaseen

doi:10.1016/j.envint.2023.107931

Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters

Environ Int. 2023 May:175:107931. doi: 10.1016/j.envint.2023.107931. Epub 2023 Apr 15.

Authors

Affiliations

¹ School of Computer and Information, Qiannan Normal University for Nationalities, Duyun, Guizhou 558000, China; State Key Laboratory of Public Big Data, Guizhou University, Guizhou, Guiyang 550025, China; Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia. Electronic address: haitao@sgmtu.edu.cn.
² Faculty of Applied Sciences, UniversitiTeknologi MARA, 40450 Shah Alam, Selangor, Malaysia. Electronic address: ali288@uitm.edu.my.
³ Dep of Computer Technology Engineering, Engineering Technical College, University of Alkitab, Iraq. Electronic address: akhsh@uoalkitab.edu.iq.
⁴ Department of Building and Construction Technologies Engineering, AL-Mustaqbal University College, Hillah 51001, Iraq. Electronic address: zainabal-khafaji@uomus.edu.iq.
⁵ Computer Science and Engineering Department, University of Kurdistan Hewler, Erbil, KR, Iraq. Electronic address: tarik.ahmed@ukh.edu.krd.
⁶ UniSQ College, University of Southern Queensland, QLD 4350, Australia. Electronic address: Mumtaz.Ali@usq.edu.au.
⁷ Dept. of Civil, Environmental and Natural Resources Engineering, Lulea Univ. of Technology, Lulea T3334, Sweden. Electronic address: nadhir.alansari@ltu.se.
⁸ Information and Communication Technology Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Iraq; College of Computer Sciences and Information Technology, University of Kerbala, Karbala, Iraq. Electronic address: haydar@alayen.edu.iq.
⁹ Department of Hydraulics and Hydrology, School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia (UTM), 81310 Skudia, Johor, Malaysia. Electronic address: sshahid@utm.my.
¹⁰ Civil and Environmental Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia; Interdisciplinary Research Center for Membranes and Water Security, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia. Electronic address: z.yaseen@kfupm.edu.sa.

PMID: 37119651
DOI: 10.1016/j.envint.2023.107931

Abstract

This study uses machine learning (ML) models for a high-resolution prediction (0.1°×0.1°) of air fine particular matter (PM_2.5) concentration, the most harmful to human health, from meteorological and soil data. Iraq was considered the study area to implement the method. Different lags and the changing patterns of four European Reanalysis (ERA5) meteorological variables, rainfall, mean temperature, wind speed and relative humidity, and one soil parameter, the soil moisture, were used to select the suitable set of predictors using a non-greedy algorithm known as simulated annealing (SA). The selected predictors were used to simulate the temporal and spatial variability of air PM_2.5 concentration over Iraq during the early summer (May-July), the most polluted months, using three advanced ML models, extremely randomized trees (ERT), stochastic gradient descent backpropagation (SGD-BP) and long short-term memory (LSTM) integrated with Bayesian optimizer. The spatial distribution of the annual average PM_2.5 revealed the population of the whole of Iraq is exposed to a pollution level above the standard limit. The changes in temperature and soil moisture and the mean wind speed and humidity of the month before the early summer can predict the temporal and spatial variability of PM_2.5 over Iraq during May-July. Results revealed the higher performance of LSTM with normalized root-mean-square error and Kling-Gupta efficiency of 13.4% and 0.89, compared to 16.02% and 0.81 for SDG-BP and 17.9% and 0.74 for ERT. The LSTM could also reconstruct the observed spatial distribution of PM_2.5 with MapCurve and Cramer's V values of 0.95 and 0.91, compared to 0.9 and 0.86 for SGD-BP and 0.83 and 0.76 for ERT. The study provided a methodology for forecasting spatial variability of PM_2.5 concentration at high resolution during the peak pollution months from freely available data, which can be replicated in other regions for generating high-resolution PM_2.5 forecasting maps.

Keywords: Air quality prediction; Arid climate; Machine learning; PM(2.5) concentration; Simulated annealing.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants* / analysis
Air Pollution* / analysis
Algorithms
Bayes Theorem
Environmental Monitoring / methods
Humans
Machine Learning
Particulate Matter / analysis

Substances

Air Pollutants
Particulate Matter