Machine learning analysis to interpret the effect of the photocatalytic reaction rate constant (k) of semiconductor-based photocatalysts on dye removal

J Hazard Mater. 2024 Mar 5:465:132995. doi: 10.1016/j.jhazmat.2023.132995. Epub 2023 Nov 11.

Abstract

Photocatalytic reactions with semiconductor-based photocatalysts have been investigated extensively for application to wastewater treatment, especially dye degradation, yet the interactions between different process parameters have rarely been reported due to their complicated reaction mechanisms. Hence, this study aims to discern the impact of each factor, and each interaction between multiple factors on reaction rate constant (k) using a decision tree model. The dyes selected as target pollutants were indigo and malachite green, and 5 different semiconductor-based photocatalysts with 17 different compositions were tested, which generated 34 input features and 1527 data points. The Boruta Shapley Additive exPlanations (SHAP) feature selection for the 34 inputs found that 11 inputs were significantly important. The decision tree model exhibited for 11 input features with an R2 value of 0.94. The SHAP feature importance analysis suggested that photocatalytic experimental conditions, with an importance of 59%, was the most important input category, followed by atomic composition (39%) and physicochemical properties (2%). Additionally, the effects on k of the synergy between the metal cocatalysts and important experimental conditions were confirmed by two feature SHAP dependence plots, regardless of importance order. This work provides insight into the single and multiple factors that affect reaction rate and mechanism.

Keywords: Dye degradation; Machine learning; Photocatalysis; Reaction rate constant (k); Two-feature SHAP interactions.