Hazards and influence factors of arsenic in the upper pleistocene aquifer, Hetao region, using machine learning modeling

Sci Total Environ. 2024 Mar 15:916:170247. doi: 10.1016/j.scitotenv.2024.170247. Epub 2024 Jan 23.

Abstract

The Hetao region is one of the regions with the most serious problem of the greatest measured arsenic concentrations in China. The enrichment of arsenic in groundwater may poses a great risk to the health of local residents. A comprehensive understanding of the groundwater quality, spatial distribution characteristics and hazard of the high arsenic in groundwater is indispensable for the sustainable utilization of groundwater resources and resident health. This study selected six environmental factors, climate, human activity, sedimentary environment, hydrogeology, soil, and others, as the independent input variables to the model, compared three machine learning algorithms (support vector machine, extreme gradient boosting, and random forest), and mapped unsafe arsenic to estimate the population that may be exposed to unhealthy conditions in the Hetao region. The results show that nearly half the number of the 605 sampling wells for arsenic exceeded the WHO provisional guide value for drinking water, the water chemistry of groundwater are mainly Na-HCO3-Cl or Na-Mg-HCO3-Cl type water, and the groundwater with excessive arsenic concentration is mainly concentrated in the ancient stream channel influence zone and the Yellow River crevasse splay. The results of factor importance explanation revealed that the sedimentary environment was the key factor affecting the primary high arsenic groundwater concentration, followed by climate and human activities. The random forest algorithm produced the probability distribution of high arsenic groundwater that is consistent with the observed results. The estimated area of groundwater with excessive arsenic reached 38.81 %. An estimated 940,000 people could be exposed to high arsenic in groundwater.

Keywords: Arsenic hazard distribution; Groundwater; Hetao region; High arsenic; Machine learning.