Identifying multiple soil pollutions of potentially contaminated sites based on multi-gate mixture-of-experts network

Sci Total Environ. 2023 Dec 10:903:166218. doi: 10.1016/j.scitotenv.2023.166218. Epub 2023 Aug 10.

Abstract

With the rapid increase in the amount and sources of big data, using big data and machine learning methods to identify site soil pollution has become a research hotspot. However, previous studies that used basic information of sites as pollution identification indexes mainly have problems of low accuracy and efficiency when conducting complex model predictions for multiple soil pollution types. In this study, we collected the environmental data of 199 sites in 6 typical industries involving heavy metal and organic pollution. After feature fusion and selection, 10 indexes based on pollution sources and pathways were used to establish the soil pollution identification index system. The Multi-gate Mixture-of-Experts network (MMoE) were constructed to carry out the multi-tasks of soil heavy metals, VOCs and SVOCs pollution identification simultaneously. The SHAP framework was used to reveal the importance of pollution identification indexes on the multiple outputs of MMoE and obtain their driving factors. The results showed that the accuracies of MMoE model were 0.600, 0.783 and 0.850 for soil heavy metals, VOCs and SVOCs pollution identifications, respectively, which were 0-20 % higher than their accuracies of BP neural networks of single tasks. The indexes of raw material containing organic compounds, enterprise scale, soil pollution traces and industry types have the different significant importance on site soil pollutions. This study proposed a more efficient and accurate method to identify site soil pollutions and their driving factors, which offers a step towards realizing intelligent identification and risk control of site soil pollution globally.

Keywords: Driving factor analysis; Model interpretability of SHAP framework; Multi-gate mixture-of-experts network; Multi-task learning; Soil pollution identification of sites.