Data Augmentation for a Virtual-Sensor-Based Nitrogen and Phosphorus Monitoring

Sensors (Basel). 2023 Jan 17;23(3):1061. doi: 10.3390/s23031061.

Abstract

To better control eutrophication, reliable and accurate information on phosphorus and nitrogen loading is desired. However, the high-frequency monitoring of these variables is economically impractical. This necessitates using virtual sensing to predict them by utilizing easily measurable variables as inputs. While the predictive performance of these data-driven, virtual-sensor models depends on the use of adequate training samples (in quality and quantity), the procurement and operational cost of nitrogen and phosphorus sensors make it impractical to acquire sufficient samples. For this reason, the variational autoencoder, which is one of the most prominent methods in generative models, was utilized in the present work for generating synthetic data. The generation capacity of the model was verified using water-quality data from two tributaries of the River Thames in the United Kingdom. Compared to the current state of the art, our novel data augmentation-including proper experimental settings or hyperparameter optimization-improved the root mean squared errors by 23-63%, with the most significant improvements observed when up to three predictors were used. In comparing the predictive algorithms' performances (in terms of the predictive accuracy and computational cost), k-nearest neighbors and extremely randomized trees were the best-performing algorithms on average.

Keywords: deep neural network; eutrophication; machine learning; parameter optimization; soft sensor; surrogate variables; synthetic data; variational autoencoder; water-quality monitoring.

Grants and funding

This research received no external funding.