Hydrogeochemical and sediment parameters improve predication accuracy of arsenic-prone groundwater in random forest machine-learning models

Sci Total Environ. 2023 Nov 1:897:165511. doi: 10.1016/j.scitotenv.2023.165511. Epub 2023 Jul 12.

Abstract

The relative importance of groundwater geochemicals and sediment characteristics in predicting groundwater arsenic distributions was rarely documented. To figure this out, we established a random forest machine-learning model to predict groundwater arsenic distributions in the Hetao Basin, China, by using 22 variables of climate, topographic features, soil properties, sediment characteristics, groundwater geochemicals, and hydraulic gradients of 492 groundwater samples. The established model precisely captured the patchy distributions of groundwater arsenic concentrations in the basin with an AUC value of 0.84. Results suggest that Fe(II) was the most prominent variable in predicting groundwater arsenic concentrations, which supported that the enrichment of arsenic in groundwater was caused by the reductive dissolution of Fe(III) oxides. The high relative importance of SO42- indicated that sulfate reduction was also conducive to groundwater arsenic enrichment in inland basins. Nevertheless, parameters of climate variables, sediment characteristics, and soil properties showed secondly important roles in predicting groundwater arsenic concentrations. The other two models, which excluded parameters of groundwater geochemicals and/or sediment characteristics, showed much worse predictions than the model considering all variables. This highlights the importance of variables of groundwater geochemicals and sediment characteristics in improving the precision and accuracy of predicting results. Future studies should probe a method constructing the random forest predicting model with high precision based on the limited number of groundwater samples and sediment samples.

Keywords: Arsenic; Groundwater; Hetao Basin; Prediction; Random forest model.