Improving the efficiency of machine learning in simulating sedimentary heavy metal contamination by coupling preposing feature selection methods

Chemosphere. 2023 May:322:138205. doi: 10.1016/j.chemosphere.2023.138205. Epub 2023 Feb 21.

Abstract

Sediment cores were collected from Taihu Lake in China. The chronology was determined by radionuclide. Heavy metals and magnetic properties of each core slice were assessed, respectively. The concentrations of most heavy metals in sediments surged at 20 cm from the surface, accompanying the increase in the concentrations of single-domain magnetic particles. This may be resulted from the influence of anthropic activities on the lake's environment after the 1970s. Two feature selection methods, random forest (RF) and maximal information coefficient (MIC), were combined with support vector machine (SVM) model to simulate heavy metals, with the inclusion of selected magnetic and physicochemical parameters. Compared with the modeling results obtained with the full set of parameters, a reasonable simulation performance was obtained with RF and MIC. RF performed better than MIC by increasing the R2 of simulation models for Cd, Cr, Cu, Pb, and Sb. For heavy metals with high ecological risks (As, Cd, Cr, Hg, Pb, Sb), the correlation coefficients for observed and predicted data ranged from 0.73 to 0.97 with only 14-27% of the parameters selected by RF as input variables. The RF-RBF-SVM enabled heavy metal predictions based on the magnetic properties of the lake sediments.

Keywords: Environmental magnetism; Lake sediment core; Maximal information coefficient; Random forest; Support vector machine.

MeSH terms

  • Cadmium
  • China
  • Environmental Monitoring / methods
  • Geologic Sediments / chemistry
  • Lakes / chemistry
  • Lead
  • Machine Learning
  • Metals, Heavy* / analysis
  • Risk Assessment
  • Water Pollutants, Chemical* / analysis

Substances

  • Cadmium
  • Lead
  • Water Pollutants, Chemical
  • Metals, Heavy