Modeling arsenic in European topsoils with a coupled semiparametric (GAMLSS-RF) model for censored data

Environ Int. 2024 Mar:185:108544. doi: 10.1016/j.envint.2024.108544. Epub 2024 Mar 1.

Abstract

Arsenic (As) is a versatile heavy metalloid trace element extensively used in industrial applications. As is carcinogen, poses health risks through both inhalation and ingestion, and is associated with an increased risk of liver, kidney, lung, and bladder tumors. In the agricultural context, the repeated application of arsenical products leads to elevated soil concentrations, which are also affected by environmental and management variables. Since exposure to As poses risks, effective assessment tools to support environmental and health policies are needed. However, the most comprehensive soil As data available, the Land Use/Cover Area frame statistical Survey (LUCAS) database, contains severe limitations due to high detection limits. Although within International Organization for Standardization standards, the detection limits preclude the adoption of standard methodologies for data analysis. The present work focused on developing a new method to model As contamination in European soils using LUCAS soil samples. We introduce the GAMLSS-RF model, a novel approach that couples Random Forests with Generalized Additive Models for Location, Scale, and Shape. The semiparametric model can capture non-linear interactions among input variables while accommodating censored and non-censored observations and can be calibrated to include information from other campaign databases. After fitting and validating a spatial model, we produced European-scale As concentration maps at a 250 m spatial resolution and evaluated the patterns against reference values (i.e., two action levels and a background concentration). We found a significant variability of As concentration across the continent, with lower concentrations in Northern countries and higher concentrations in Portugal, Spain, Austria, France and Belgium. By overcoming limitations in existing databases and methodologies, the present approach provides an alternative way to handle highly censored data. The model also consists of a valuable probabilistic tool for assessing As contamination risks in soils, contributing to informed policy-making for environmental and health protection.

Keywords: Arsenic; GAMLSS; Random forest; Soil contamination; Statistical modeling; Trace element.

MeSH terms

  • Agriculture
  • Arsenic* / analysis
  • Environmental Monitoring / methods
  • France
  • Metals, Heavy* / analysis
  • Risk Assessment
  • Soil
  • Soil Pollutants* / analysis

Substances

  • Arsenic
  • Soil
  • Soil Pollutants
  • Metals, Heavy