One-Week-Ahead Prediction of Cyanobacterial Harmful Algal Blooms in Iowa Lakes

Environ Sci Technol. 2023 Dec 12;57(49):20636-20646. doi: 10.1021/acs.est.3c07764. Epub 2023 Nov 27.

Abstract

Cyanobacterial harmful algal blooms (CyanoHABs) pose serious risks to inland water resources. Despite advancements in our understanding of associated environmental factors and modeling efforts, predicting CyanoHABs remains challenging. Leveraging an integrated water quality data collection effort in Iowa lakes, this study aimed to identify factors associated with hazardous microcystin levels and develop one-week-ahead predictive classification models. Using water samples from 38 Iowa lakes collected between 2018 and 2021, feature selection was conducted considering both linear and nonlinear properties. Subsequently, we developed three model types (Neural Network, XGBoost, and Logistic Regression) with different sampling strategies using the nine selected variables (mcyA_M, TKN, % hay/pasture, pH, mcyA_M:16S, % developed, DOC, dewpoint temperature, and ortho-P). Evaluation metrics demonstrated the strong performance of the Neural Network with oversampling (ROC-AUC 0.940, accuracy 0.861, sensitivity 0.857, specificity 0.857, LR+ 5.993, and 1/LR- 5.993), as well as the XGBoost with downsampling (ROC-AUC 0.944, accuracy 0.831, sensitivity 0.928, specificity 0.833, LR+ 5.557, and 1/LR- 11.569). This study exhibited the intricacies of modeling with limited data and class imbalances, underscoring the importance of continuous monitoring and data collection to improve predictive accuracy. Also, the methodologies employed can serve as meaningful references for researchers tackling similar challenges in diverse environments.

Keywords: XGBoost; class imbalance; classification models; cyanobacterial harmful algal blooms; environmental monitoring; freshwater lakes; logistic regression; microcystin concentration; neural network; predictive modeling.

MeSH terms

  • Cyanobacteria*
  • Harmful Algal Bloom*
  • Iowa
  • Lakes / microbiology