Development of new computational machine learning models for longitudinal dispersion coefficient determination: case study of natural streams, United States

Environ Sci Pollut Res Int. 2022 May;29(24):35841-35861. doi: 10.1007/s11356-022-18554-y. Epub 2022 Jan 21.

Abstract

Natural streams longitudinal dispersion coefficient (Kx) is an essential indicator for pollutants transport and its determination is very important. Kx is influenced by several parameters, including river hydraulic geometry, sediment properties, and other morphological characteristics, and thus its calculation is a highly complex engineering problem. In this research, three relatively explored machine learning (ML) models, including Random Forest (RF), Gradient Boosting Decision Tree (GTB), and XGboost-Grid, were proposed for the Kx determination. The modeling scheme on building the prediction matrix was adopted from the well-established literature. Several input combinations were tested for better predictability performance for the Kx. The modeling performance was tested based on the data division for the training and testing (70-30% and 80-20%). Based on the attained modeling results, XGboost-Grid reported the best prediction results over the training and testing phase compared to RF and GTB models. The development of the newly established machine learning model revealed an excellent computed-aided technology for the Kx simulation.

Keywords: Data division; Input variability; Longitudinal dispersion coefficient; Machine learning.

MeSH terms

  • Machine Learning*
  • Rivers*
  • United States
  • Water Pollution* / analysis