Machine Learning-Assisted, Process-Based Quality Control for Detecting Compromised Environmental Sensors

Environ Sci Technol. 2023 Nov 21;57(46):18058-18066. doi: 10.1021/acs.est.3c00360. Epub 2023 Aug 15.

Abstract

Machine learning (ML) techniques promise to revolutionize environmental research and management, but collecting the necessary volumes of high-quality data remains challenging. Environmental sensors are often deployed under harsh conditions, requiring labor-intensive quality assurance and control (QAQC) processes. The need for manual QAQC is a major impediment to the scalability of these sensor networks. Existing techniques for automated QAQC make strong assumptions about noise profiles in the data they filter that do not necessarily hold for broadly deployed environmental sensors, however. Toward the goal of increasing the volume of high-quality environmental data, we introduce an ML-assisted QAQC methodology that is robust to low signal-to-noise ratio data. Our approach embeds sensor measurements into a dynamical feature space and trains a binary classification algorithm (Support Vector Machine) to detect deviation from expected process dynamics, indicating whether a sensor has become compromised and requires maintenance. This strategy enables the automated detection of a wide variety of nonphysical signals. We apply the methodology to three novel data sets produced by 136 low-cost environmental sensors (stream level, drinking water pH, and drinking water electroconductivity), deployed by our group across 250,000 km2 in Michigan, USA. The proposed methodology achieved accuracy scores of up to 0.97 and consistently outperformed state-of-the-art anomaly detection techniques.

Keywords: automated data validation; data quality control and assurance; environmental sensors; machine learning; wireless sensor networks.

MeSH terms

  • Algorithms
  • Drinking Water*
  • Machine Learning
  • Michigan

Substances

  • Drinking Water