Exploring potential machine learning application based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants

Sci Total Environ. 2022 Aug 1:832:154930. doi: 10.1016/j.scitotenv.2022.154930. Epub 2022 Apr 4.

Abstract

Water pollution generated from intensive anthropogenic activities has emerged as a critical issue concerning ecosystem balance and livelihoods worldwide. Although optimizing wastewater treatment efficiency is widely regarded as the foremost step to minimize pollutants released into the environment, this widespread application has encountered two major problems: firstly, the significant variation of influent wastewater constituents; secondly, complex treatment processes within wastewater treatment plants (WWTPs). Based on the data collected hourly using real-time sensors in three different full-scale WWTPs (24 h × 365 days × 3 WWTPs × 10 wastewater parameters), this work introduced the potential application of Machine Learning (ML) to predict wastewater quality. In this work, six different ML algorithms were examined and compared, varying from shallow to deep learning architectures including Seasonal Autoregressive Integrated Moving Average (SARIMAX), Random Forest (RF), Support Vector Machine (SVM), Gradient Tree Boosting (GTB), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Long Short-Term Memory (LSTM). These models were developed to detect total phosphorus in the outlet (Outlet-TP), which served as an output variable due to the rising concerns about the eutrophication problem. Irrespective of WWTPs, SARIMAX consistently demonstrated the best performance for regression estimation as evidenced by the lowest values of Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and the highest coefficient of determination (R2). In terms of computation efficiency, SARIMAX exhibited acceptable time computation, acknowledging the successful application of this algorithm for Outlet-TP modeling. In contrast, the complex structure of LSTM made it time-consuming and unstable coupled with noise, while other shallower architectures, i.e., RF, SVM, GTB, and ANFIS were unable to address large datasets with nonlinear and nonstationary behavior. Consequently, this study provides a reliable and accurate approach to forecast wastewater effluent quality, which is pivotal in terms of the socio-economic aspects of wastewater management.

Keywords: Deep learning; Machine learning; Wastewater treatment phosphorus; Water pollution.

MeSH terms

  • Big Data
  • Ecosystem
  • Machine Learning
  • Wastewater*
  • Water Purification*

Substances

  • Waste Water