Evaluation of E. coli in sediment for assessing irrigation water quality using machine learning

Sci Total Environ. 2021 Dec 10:799:149286. doi: 10.1016/j.scitotenv.2021.149286. Epub 2021 Jul 28.

Abstract

Fresh produce irrigated with contaminated water poses a substantial risk to human health. This study evaluated the impact of incorporating sediment information on improving the performance of machine learning models to quantify E. coli level in irrigation water. Field samples were collected from irrigation canals in the Southwest U.S., for which meteorological, chemical, and physical water quality variables as well as three additional flow and sediment properties: the concentration of E. coli in sediment, sediment median size, and bed shear stress. Water quality was classified based on E. coli concentration exceeding two standard levels: 1 E. coli and 126 E. coli colony forming units (CFU) per 100 ml of irrigation water. Two series of features, including (FIS) and excluding (FES) sediment features, were selected using multi-variant filter feature selection. The correlation analysis revealed the inclusion of sediment features improves the correlation with the target standards for E. coli compared to the models excluding these features. Support vector machine, logistic regression, and ridge classifier were tested in this study. The support vector machine model performed the best for both targeted standards. Besides, incorporating sediment features improved all models' performance. Therefore, the concentration of E. coli in sediment and bed shear stress are major factors influencing E. coli concentration in irrigation water.

Keywords: Bed shear stress; Classification technique; Kernel logistic regression; Principal component analysis; Sediment; Support vector machine.

MeSH terms

  • Agricultural Irrigation
  • Escherichia coli*
  • Machine Learning
  • Water Microbiology
  • Water Quality*