Evaluation of algal species distributions and prediction of cyanophyte cell counts using statistical techniques

Environ Sci Pollut Res Int. 2023 Nov;30(55):117143-117164. doi: 10.1007/s11356-023-30077-8. Epub 2023 Oct 21.

Abstract

Safe drinking water sources are crucial for human health. Consequently, water quality management, including continuous monitoring of water quality and algae at sources, is critical to ensure the availability of safe water for local residents. This study aimed to construct statistical prediction models considering probability distributions relevant to cyanophyte cell counts and compare their prediction performance. In this study, water quality parameters at Juam Lake and Tamjin Lake, representative water sources in the Yeongsan and Seomjin rivers, South Korea, were investigated. We used a water quality monitoring network, algae alert system, and hydraulic and hydrological data measured every 7 days from January 2017 to December 2022 from the Water Environment Information System of the National Institute of Environmental Research. Using data for 2017-2021 as a training set and data for 2022 as a test set, the performances of seven models were compared for predicting cyanophyte cell counts. Environmental factors associated with algae in water sources were observed based on the monitoring data, and a prediction model appropriate for the cyanophyte distribution was generated, which also included the risk of toxicity. The extreme gradient boosting with the random forest model had the best predictive performance for cyanophyte cell counts. The study results are expected to facilitate water quality management in various water systems, including water sources.

Keywords: Cyanophytes; Random forest model; Redundancy analysis; South Korea; Statistical model; Water quality.

MeSH terms

  • Environmental Monitoring / methods
  • Humans
  • Lakes
  • Models, Statistical
  • Republic of Korea
  • Rivers*
  • Water Quality*