Using machine learning approach to reproduce the measured feature and understand the model-to-measurement discrepancy of atmospheric formaldehyde

Sci Total Environ. 2022 Dec 10;851(Pt 2):158271. doi: 10.1016/j.scitotenv.2022.158271. Epub 2022 Aug 24.

Abstract

The solar absorption spectrometry in the infrared spectral region, using high-resolution Fourier transform infrared (FTIR) spectrometer, has been established as a powerful tool in atmospheric science. These observations cannot be performed continuously, for example, clouds prevent observations. On the other hand, chemical transport models give continuously data. Their results depend on the knowledge of emission inventories, the chemistry involved, and the meteorological fields, yielding to potential biases between measurements and simulations. In our study we concentrated on Formaldehyde (HCHO) and used machine learning approach to fill the gap between the observations, performed on an irregular time scale and having their measurement lacks, and model data, giving continuous data, but having potential variable biases. The proposed machine learning approach is based on the Light Gradient Boosting Machine (LightGBM) algorithm and created by using GEOS-Chem simulations, meteorological fields, emission inventory, and is referred to as the GEOS-Chem-LightGBM model. The results of established GEOS-Chem-LightGBM model have generated consistent HCHO predictions with the ground-based FTIR and satellite (OMI and TROPOMI) observations. In order to understand the GEOS-Chem model to measurement discrepancy, we have investigated the contribution of each input variable to GEOS-Chem-LightGBM model HCHO predictions through the SHapely Additive exPlanations (SHAP) approach. We found that the GEOS-Chem model underestimates the sensitivities of HCHO total column to most photochemical variables, contributing to lower amplitudes of diurnal cycle and seasonal cycle by the GEOS-Chem model. By correcting the model-to-measurement discrepancy, the sensitivities of HCHO total column to all variables by the GEOS-Chem-LightGBM became to be in good agreement with the FTIR observations. As a result, GEOS-Chem-LightGBM model has significantly improved the performance of HCHO predictions compared to the GEOS-Chem alone. The proposed GEOS-Chem-LightGBM model can be extendible to other atmospheric constituents obtained by various measurement techniques and platforms, and is expected to have wide applications.

Keywords: FTIR; GEOS-Chem; HCHO; Machine learning; Remote sensing.

MeSH terms

  • Air Pollutants* / analysis
  • Environmental Monitoring / methods
  • Formaldehyde / analysis
  • Machine Learning
  • Meteorology

Substances

  • Air Pollutants
  • Formaldehyde