Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm

Sci Total Environ. 2021 Jul 15:778:146271. doi: 10.1016/j.scitotenv.2021.146271. Epub 2021 Mar 8.

Abstract

Lake eutrophication has attracted the attention of the government and general public. Chlorophyll-a (Chl-a) is a key indicator of algal biomass and eutrophication. Many efforts have been devoted to establishing accurate algorithms for estimating Chl-a concentrations. In this study, a total of 273 samples were collected from 45 typical lakes across China during 2017-2019. Here, we proposed applicable machine learning algorithms (i.e., linear regression model (LR), support vector machine model (SVM) and Catboost model (CB)), which integrate a broad scale dataset of lake biogeochemical characteristics using Multispectral Imager (MSI) product to seamlessly retrieve the Chl-a concentration. A K-means clustering approach was used to cluster the 273 normalized water leaving reflectance spectra [Rrs (λ)] extracted from MSI imagery with Case 2 Regional Coast Colour (CR2CC) processor into three groups. The pH, electrical conductivity (EC), total suspended matter (TSM) and dissolved organic carbon (DOC) from three clustering groups had significant differences (p < 0.05**), indicating that water quality parameters have an integrated impact on Rrs(λ)-spectra. The results of machine learning algorithms integrating demonstrated that SVM obtained a better degree of measured- and derived- fitting (calibration: slope = 0.81, R2 = 0.91; validation: slope = 1.21, R2 = 0.88). On the contrary, the documented nine Chl-a algorithms gave poor results (fitting 1:1 linear slope < 0.4 and R2 < 0.70) with synchronous train and test datasets. It demonstrated that machine learning provides a robust model for quantifying Chl-a concentration. Further, considering three Rrs(λ) clustering groups by k-means, Chl-a SVM model indicated that cluster 1 group gave a better retrieving performance (slope = 0.71, R2 = 0.78), followed by cluster 3 group (slope = 0.77, R2 = 0.64) and cluster 2 group (slope = 0.67, R2 = 0.50). These are related to the low TSM and high DOC levels for cluster-1 and cluster-3 Rrs(λ) spectra, which reduce the influence of particle in red bands for Rrs(λ) signal. Our results highlighted the quantification of lake Chl-a concentrations using MSI imagery and SVM, which can realize the large-scale monitoring and more appropriate for medium/low Chl-a level. The remote estimation of Chl-a based on artificial intelligence can provide an effective and robust way to monitor the lake eutrophication on a macro-scale; and offer a better approach to elucidate the response of lake ecosystems to global change.

Keywords: Chinese lakes; Chlorophyll-a; K-means; Machine learning; Sentinel-2.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • China
  • Chlorophyll / analysis
  • Chlorophyll A / analysis
  • Ecosystem
  • Environmental Monitoring
  • Eutrophication
  • Lakes*

Substances

  • Chlorophyll
  • Chlorophyll A