A Virtual Sensing Concept for Nitrogen and Phosphorus Monitoring Using Machine Learning Techniques

Sensors (Basel). 2022 Sep 27;22(19):7338. doi: 10.3390/s22197338.

Abstract

Harmful cyanobacterial bloom (HCB) is problematic for drinking water treatment, and some of its strains can produce toxins that significantly affect human health. To better control eutrophication and HCB, catchment managers need to continuously keep track of nitrogen (N) and phosphorus (P) in the water bodies. However, the high-frequency monitoring of these water quality indicators is not economical. In these cases, machine learning techniques may serve as viable alternatives since they can learn directly from the available surrogate data. In the present work, a random forest, extremely randomized trees (ET), extreme gradient boosting, k-nearest neighbors, a light gradient boosting machine, and bagging regressor-based virtual sensors were used to predict N and P in two catchments with contrasting land uses. The effect of data scaling and missing value imputation were also assessed, while the Shapley additive explanations were used to rank feature importance. A specification book, sensitivity analysis, and best practices for developing virtual sensors are discussed. Results show that ET, MinMax scaler, and a multivariate imputer were the best predictive model, scaler, and imputer, respectively. The highest predictive performance, reported in terms of R2, was 97% in the rural catchment and 82% in an urban catchment.

Keywords: accuracy benchmark; baseline model; data scaling; machine learning; missing values handling; soft-sensor; specification book; surrogate parameters; water quality monitoring.

MeSH terms

  • Cyanobacteria*
  • Drinking Water*
  • Humans
  • Machine Learning
  • Nitrogen / analysis
  • Phosphorus / analysis

Substances

  • Drinking Water
  • Nitrogen
  • Phosphorus

Grants and funding

This research received no external funding.