Decision tree ensemble with Bayesian optimization to predict the spatial dynamics of chlorophyll-a concentration: A case study in Bay of Bengal

Mar Pollut Bull. 2024 Feb:199:115945. doi: 10.1016/j.marpolbul.2023.115945. Epub 2023 Dec 27.

Abstract

An accurate prediction of the spatial distribution of phytoplankton biomass, as represented by Chlorophyll-a (CHL-a) concentrations, is important for assessing ecological conditions in the marine environment. This study developed a hyperparameter-optimized decision tree-based machine learning (ML) models to predict the geographical distribution of marine phytoplankton CHL-a in the Bay of Bengal. To predict CHL-a over a large spatial extent, satellite-derived remotely sensed data of ocean color features (CHL-a, colored dissolved organic matter, photosynthetically active radiation, particulate organic carbon) and climatic factors (nighttime sea surface temperature, surface absorbed longwave radiation, sea level pressure) from 2003 to 2022 are used to train and test the models. Results obtained from this study have shown the highest concentrations of CHL-a occurred near the Bay's coastal belts and river estuaries. Analysis revealed that aside from photosynthetically active radiation, organic components exhibited a stronger positive relationship with CHL-a than climatic features, which are correlated negatively. Results showed the chosen decision tree methods to all possess higher R2 and lower root mean square error (RMSE) errors. Furthermore, XGBoost outperforms all other models in predicting the geographic distribution of CHL-a. To assess the model efficacy on seasonal basis, a best performing XGBoost model was validated in the Bay of Bengal region which has shown a good performance in predicting the spatial distribution of Chl-a as well as the pixel values during the summer, winter and monsoon seasons. This study provides the best ML model to researchers for predicting CHL-a in the Bay of Bengal. Further it helps to improve our knowledge of CHL-a spatial dynamics and assist in monitoring marine resources in the Bay of Bengal. It worth noting that the water quality in the Indian Ocean is very dynamic in nature, therefore, additional efforts are needed to test the efficacy of this study model over different seasons and spatial gradients.

Keywords: Bay of Bengal; Chlorophyll-a; Climate factors; MODIS; Machine Learning; Remote Sensing.

MeSH terms

  • Bayes Theorem
  • Bays*
  • Chlorophyll / analysis
  • Chlorophyll A / analysis
  • Decision Trees
  • Environmental Monitoring* / methods
  • Phytoplankton
  • Seasons

Substances

  • Chlorophyll A
  • Chlorophyll