Preliminary risk assessment of regional industrial enterprise sites based on big data

Sci Total Environ. 2022 Sep 10;838(Pt 4):156609. doi: 10.1016/j.scitotenv.2022.156609. Epub 2022 Jun 9.

Abstract

An accurate and inexpensive preliminary risk assessment of industrial enterprise sites at a regional scale is critical for environmental management. In this study, we propose a novel framework for the preliminary risk assessment of industrial enterprise sites in the Yangtze River Delta, which is one of the fastest economic development and most prominent contaminated regions in China. Based on source-pathway-receptors, this framework integrated text and spatial analyses and machine learning, and its feasibility was validated with 8848 positive and negative samples with a calibration and validation set ratio of 8:2. The results indicated that the random forest performed well for risk assessment; and its accuracy, precision, recall, and F1 scores in the calibration set were all 1.0, and the four indicators for the validation set ranged from 0.97 to 0.98, which was better than that for the other models (e.g., logistic regression, support vector machine, and convolutional neural network). The preliminary risk ranking of industrial enterprise sites by the random forest showed that high risks (probabilities) were mainly distributed in Shanghai, southern Jiangsu, and northeastern Zhejiang from 2000 to 2015. The relative importance of the site industrial, production, and geographical features in the random forest was 69%, 22%, and 9%, respectively. Our study highlights that we could quickly and effectively establish a priority (or ranking) list of industrial enterprise sites that require further investigations, using the proposed framework, and identify potentially contaminated sites.

Keywords: Industrial enterprises; Machine learning; Preliminary risk assessment; Yangtze River Delta.

MeSH terms

  • Big Data*
  • China
  • Industry
  • Risk Assessment / methods
  • Rivers*