Robust predictive modelling of water pollution using biomarker data

Water Res. 2010 May;44(10):3294-308. doi: 10.1016/j.watres.2010.03.006. Epub 2010 Mar 16.

Abstract

This paper describes the methodology of building a predictive model for the purpose of marine pollution monitoring, based on low quality biomarker data. A step-by-step, systematic data analysis approach is presented, resulting in design of a purely data-driven model, able to accurately discriminate between various coastal water pollution levels. The environmental scientists often try to apply various machine learning techniques to their data without much success, mostly because of the lack of experience with different methods and required 'under the hood' knowledge. Thus this paper is a result of a collaboration between the machine learning and environmental science communities, presenting a predictive model development workflow, as well as discussing and addressing potential pitfalls and difficulties. The novelty of the modelling approach presented lays in successful application of machine learning techniques to high dimensional, incomplete biomarker data, which to our knowledge has not been done before and is the result of close collaboration between machine learning and environmental science communities.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers / analysis
  • Environmental Monitoring / methods
  • Models, Theoretical*
  • Water Pollution*

Substances

  • Biomarkers