Cyanobacteria blue-green algae prediction enhancement using hybrid machine learning-based gamma test variable selection and empirical wavelet transform

Environ Sci Pollut Res Int. 2022 Nov;29(51):77157-77187. doi: 10.1007/s11356-022-21201-1. Epub 2022 Jun 8.

Abstract

This study aims to evaluate the usefulness and effectiveness of four machine learning (ML) models for modelling cyanobacteria blue-green algae (CBGA) at two rivers located in the USA. The proposed modelling framework was based on establishing a link between five water quality variables and the concentration of CBGA. For this purpose, artificial neural network (ANN), extreme learning machine (ELM), random forest regression (RFR), and random vector functional link (RVFL) are developed. First, the four models were developed using only water quality variables. Second, based on the results of the first, a new modelling strategy was introduced based on preprocessing signal decomposition. Hence, the empirical mode decomposition (EMD), the variational mode decomposition (VMD), and the empirical wavelet transform (EWT) were used for decomposing the water quality variables into several subcomponents, and the obtained intrinsic mode functions (IMFs) and multiresolution analysis (MRA) components were used as new input variables for the ML models. Results of the present investigation show that (i) using single models, good predictive accuracy was obtained using the RFR model exhibiting an R and NSE values of ≈0.914 and ≈0.833 for the first station, and ≈0.944 and ≈0.884 for the second station, while the others models, i.e., ANN, RVFL, and ELM, have failed to provide a good estimation of the CBGA; (ii) the decomposition methods have contributed to a significant improvement of the individual models performances; (iii) among the thee decomposition methods, the EMD was found to be superior to the VMD and EWT; and (iv) the ANN and RFR were found to be more accurate compared to the ELM and RVFL models, exhibiting high numerical performances with R and NSE values of approximately ≈0.983, ≈0.967, and ≈0.989 and ≈0.976, respectively.

Keywords: ANN; CBGA; ELM; EMD; EWT; Modelling; RFR; RVFL; VMD; Water quality.

MeSH terms

  • Cyanobacteria*
  • Machine Learning
  • Neural Networks, Computer
  • Rivers
  • Wavelet Analysis*