Enhancing environmental data imputation: A physically-constrained machine learning framework

Sci Total Environ. 2024 May 20:926:171773. doi: 10.1016/j.scitotenv.2024.171773. Epub 2024 Mar 24.

Abstract

In water resources management, new computational capabilities have made it possible to develop integrated models to jointly analyze climatic conditions and water quantity/quality of the entire watershed system. Although the value of this integrated approach has been demonstrated so far, the limited availability of field data may hinder its applicability by causing high uncertainty in the model response. In this context, before collecting additional data, it is recommended first to recognize what improvement in model performance would occur if all available records could be well exploited. This work proposes a novel machine learning framework with physical constraints capable of successfully imputing a high percentage of missing data belonging to several environmental domains (meteorology, water quantity, water quality), yielding satisfactory results. In particular, the minimum NSE computed for meteorologic variables is 0.72. For hydrometric variables, NSE is always >0.97. More than 78 % of the physical-water-quality variables is characterized by NSE > 0.45, and >66 % of the chemical-water quality variables reaches NSE > 0.35. This work's results demonstrate the proposed framework's effectiveness as a data augmentation tool to improve the performance of integrated environmental modeling.

Keywords: Data imputation; Environmental data; Machine learning; Missing values; Physical constraints.