Missing Value Imputation of Wireless Sensor Data for Environmental Monitoring

Sensors (Basel). 2024 Apr 10;24(8):2416. doi: 10.3390/s24082416.

Abstract

Over the past few years, the scale of sensor networks has greatly expanded. This generates extended spatiotemporal datasets, which form a crucial information resource in numerous fields, ranging from sports and healthcare to environmental science and surveillance. Unfortunately, these datasets often contain missing values due to systematic or inadvertent sensor misoperation. This incompleteness hampers the subsequent data analysis, yet addressing these missing observations forms a challenging problem. This is especially the case when both the temporal correlation of timestamps within a single sensor and the spatial correlation between sensors are important. Here, we apply and evaluate 12 imputation methods to complete the missing values in a dataset originating from large-scale environmental monitoring. As part of a large citizen science project, IoT-based microclimate sensors were deployed for six months in 4400 gardens across the region of Flanders, generating 15-min recordings of temperature and soil moisture. Methods based on spatial recovery as well as time-based imputation were evaluated, including Spline Interpolation, MissForest, MICE, MCMC, M-RNN, BRITS, and others. The performance of these imputation methods was evaluated for different proportions of missing data (ranging from 10% to 50%), as well as a realistic missing value scenario. Techniques leveraging the spatial features of the data tend to outperform the time-based methods, with matrix completion techniques providing the best performance. Our results therefore provide a tool to maximize the benefit from costly, large-scale environmental monitoring efforts.

Keywords: environmental monitoring; imputation; missing data; time series; wireless sensor networks.