Short-term Lake Erie algal bloom prediction by classification and regression models

Water Res. 2023 Apr 1:232:119710. doi: 10.1016/j.watres.2023.119710. Epub 2023 Feb 5.

Abstract

The recent outbreaks of harmful algal blooms in the western Lake Erie Basin (WLEB) have drawn tremendous attention to bloom prediction for better control and management. Many weekly to annual bloom prediction models have been reported, but they only employ small datasets, have limited types of input features, build linear regression or probabilistic models, or require complex process-based computations. To address these limitations, we conducted a comprehensive literature review, complied a large dataset containing chlorophyll-a index (from 2002 to 2019) as the output and a novel combination of riverine (the Maumee & Detroit Rivers) and meteorological (WLEB) features as the input, and built machine learning-based classification and regression models for 10-d scale bloom predictions. By analyzing the feature importance, we identified 8 most important features for the HAB control, including nitrogen loads, time, water levels, soluble reactive phosphorus load, and solar irradiance. Here, both long- and short-term nitrogen loads were for the first time considered in HAB models for Lake Erie. Based on these features, the 2-, 3-, and 4-level random forest classification models achieved an accuracy of 89.6%, 77.0%, and 66.7%, respectively, and the regression model achieved an R2 value of 0.69. In addition, long-short term memory (LSTM) was implemented to predict temporal trends of four short-term features (N, solar irradiance, and two water levels) and achieved the Nash-Sutcliffe efficiency of 0.12-0.97. Feeding the LSTM model predictions for these features into the 2-level classification model reached an accuracy of 86.0% for predicting the HABs in 2017-2018, suggesting that we can provide short-term HAB forecasts even when the feature values are not available.

Keywords: Bloom forecast; Feature selection; Long-short term memory; Machine learning; Random forest; Time series modeling.

Publication types

  • Review

MeSH terms

  • Chlorophyll A
  • Environmental Monitoring
  • Harmful Algal Bloom*
  • Lakes*
  • Models, Statistical
  • Nitrogen

Substances

  • Chlorophyll A
  • Nitrogen