Modeling the risk of water pollution by pesticides from imbalanced data

Environ Sci Pollut Res Int. 2018 Jul;25(19):18781-18792. doi: 10.1007/s11356-018-2099-7. Epub 2018 Apr 30.

Abstract

The pollution of ground and surface waters with pesticides is a serious ecological issue that requires adequate treatment. Most of the existing water pollution models are mechanistic mathematical models. While they have made a significant contribution to understanding the transfer processes, they face the problem of validation because of their complexity, the user subjectivity in their parameterization, and the lack of empirical data for validation. In addition, the data describing water pollution with pesticides are, in most cases, very imbalanced. This is due to strict regulations for pesticide applications, which lead to only a few pollution events. In this study, we propose the use of data mining to build models for assessing the risk of water pollution by pesticides in field-drained outflow water. Unlike the mechanistic models, the models generated by data mining are based on easily obtainable empirical data, while the parameterization of the models is not influenced by the subjectivity of ecological modelers. We used empirical data from field trials at the La Jaillière experimental site in France and applied the random forests algorithm to build predictive models that predict "risky" and "not-risky" pesticide application events. To address the problems of the imbalanced classes in the data, cost-sensitive learning and different measures of predictive performance were used. Despite the high imbalance between risky and not-risky application events, we managed to build predictive models that make reliable predictions. The proposed modeling approach can be easily applied to other ecological modeling problems where we encounter empirical data with highly imbalanced classes.

Keywords: Agriculture; Data mining; Imbalanced empirical data; Predictive modeling; Risk assessment; Water pollution.

MeSH terms

  • Agriculture
  • Data Analysis
  • France
  • Models, Theoretical
  • Pesticides / analysis*
  • Risk
  • Water Pollutants, Chemical / analysis*

Substances

  • Pesticides
  • Water Pollutants, Chemical