Cyanotoxin level prediction in a reservoir using gradient boosted regression trees: a case study

Environ Sci Pollut Res Int. 2018 Aug;25(23):22658-22671. doi: 10.1007/s11356-018-2219-4. Epub 2018 May 30.

Abstract

Cyanotoxins are a type of cyanobacteria that is poisonous and poses a health threat in waters that could be used for drinking or recreational purposes. Thus, it is necessary to predict their presence to avoid risks. This paper presents a nonparametric machine learning approach using a gradient boosted regression tree model (GBRT) for prediction of cyanotoxin contents from cyanobacterial concentrations determined experimentally in a reservoir located in the north of Spain. GBRT models seek and obtain good predictions in highly nonlinear problems, like the one treated here, where the studied variable presents low concentrations of cyanotoxins mixed with high concentration peaks. Two types of results have been obtained: firstly, the model allows the ranking or the dependent variables according to its importance in the model. Finally, the high performance and the simplicity of the model make the gradient boosted tree method attractive compared to conventional forecasting techniques.

Keywords: Cyanobacteria; Cyanotoxins; Gradient boosting; Harmful algal blooms (HABs); Regression trees; Statistical machine learning techniques.

MeSH terms

  • Bacterial Toxins / analysis*
  • Cyanobacteria / chemistry
  • Lakes / analysis*
  • Machine Learning
  • Regression Analysis
  • Spain
  • Statistics, Nonparametric
  • Water Supply

Substances

  • Bacterial Toxins