Machine learning for anomaly detection in cyanobacterial fluorescence signals

Water Res. 2021 Jun 1:197:117073. doi: 10.1016/j.watres.2021.117073. Epub 2021 Mar 19.

Abstract

Many drinking water utilities drawing from waters susceptible to harmful algal blooms (HABs) are implementing monitoring tools that can alert them to the onset of blooms. Some have invested in fluorescence-based online monitoring probes to measure phycocyanin, a pigment found in cyanobacteria, but it is not clear how to best use the data generated. Previous studies have focused on correlating phycocyanin fluorescence and cyanobacteria cell counts. However, not all utilities collect cell count data, making this method impossible to apply in some cases. Instead, this paper proposes a novel approach to determine when a utility needs to respond to a HAB based on machine learning by identifying anomalies in phycocyanin fluorescence data without the need for corresponding cell counts or biovolume. Four widespread and open source algorithms are evaluated on data collected at four buoys in Lake Erie from 2014 to 2019: local outlier factor (LOF), One-Class Support Vector Machine (SVM), elliptic envelope, and Isolation Forest (iForest). When trained on standardized historical data from 2014 to 2018 and tested on labelled 2019 data collected at each buoy, the One-Class SVM and elliptic envelope models both achieve a maximum average F1 score of 0.86 among the four datasets. Therefore, One-Class SVM and elliptic envelope are promising algorithms for detecting potential HABs using fluorescence data only.

Keywords: Artificial intelligence; CCchHlo C; Chlorophyll a; Cyanobacteria; Drinking water treatment; Monitoring; Phycocyanin.

MeSH terms

  • Cyanobacteria*
  • Environmental Monitoring
  • Fluorescence
  • Harmful Algal Bloom
  • Lakes
  • Machine Learning