Data-Driven Machine Learning in Environmental Pollution: Gains and Problems

Xian Liu; Dawei Lu; Aiqian Zhang; Qian Liu; Guibin Jiang

doi:10.1021/acs.est.1c06157

Data-Driven Machine Learning in Environmental Pollution: Gains and Problems

Environ Sci Technol. 2022 Feb 15;56(4):2124-2133. doi: 10.1021/acs.est.1c06157. Epub 2022 Jan 27.

Authors

Xian Liu¹, Dawei Lu¹, Aiqian Zhang^{1

2

3

4}, Qian Liu^{1

3

4}, Guibin Jiang^{1

2}

Affiliations

¹ State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China.
² School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China.
³ College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China.
⁴ Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China.

PMID: 35084840
DOI: 10.1021/acs.est.1c06157

Abstract

The complexity and dynamics of the environment make it extremely difficult to directly predict and trace the temporal and spatial changes in pollution. In the past decade, the unprecedented accumulation of data, the development of high-performance computing power, and the rise of diverse machine learning (ML) methods provide new opportunities for environmental pollution research. The ML methodology has been used in satellite data processing to obtain ground-level concentrations of atmospheric pollutants, pollution source apportionment, and spatial distribution modeling of water pollutants. However, unlike the active practices of ML in chemical toxicity prediction, advanced algorithms such as deep neural networks in environmental process studies of pollutants are still deficient. In addition, over 40% of the environmental applications of ML go to air pollution, and its application range and acceptance in other aspects of environmental science remain to be increased. The use of ML methods to revolutionize environmental science and its problem-solving scenarios has its own challenges. Several issues should be taken into consideration, such as the tradeoff between model performance and interpretability, prerequisites of the machine learning model, model selection, and data sharing.

Keywords: Artificial intelligence; Big data; Environmental processes; Machine learning; Pollution.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants* / analysis
Air Pollution* / analysis
Algorithms
Environmental Monitoring / methods
Environmental Pollutants*
Machine Learning

Substances

Air Pollutants
Environmental Pollutants