Experiments and machine learning-based modeling for haloacetic acids rejection by nanofiltration: Influence of solute properties and operating conditions

Sci Total Environ. 2023 Jul 20:883:163610. doi: 10.1016/j.scitotenv.2023.163610. Epub 2023 Apr 23.

Abstract

Because of potential risks to public health, the presence of haloacetic acids (HAAs) in drinking water is a major concern. Nanofiltration (NF) has shown potential for HAAs rejection, and several factors, namely, membrane properties, solute properties, and operating conditions, have been revealed key roles. However, knowledge of NF separation mechanism by quantifying these factors is limited. This study investigated and modeled NF performance on HAAs rejection. NF performance was experimentally investigated under various transmembrane pressure (TMP), cross-flow velocity (CV), temperature, pH, ionic strength (IS), and HAAs initial feed concentration (Cin). We used machine learning (ML) to understand the mechanism from the perspective of HAAs properties and operating conditions. Multiple linear regression (MLR), support vector machine (SVM), multsilayer perceptron (MLP), extreme gradient boosting (XGBoost), and random forest (RF) models were used. The MLP, XGBoost and RF models achieved significant performance with high R2 (0.970, 0.973, and 0.980) and low RMSE (4.71, 4.41, and 3.84). These three models were analyzed using the Shapley Additive explanation (SHAP) to quantify relative contributions of HAAs properties and operating conditions. XGBoost-SHAP produced the most logical results and was the best-performing model for selecting optimal input variables combinations. The results showed that Stokes radius (rs), logarithmic octanol-water partitioning coefficient (logKow), molecular weight (MW), pH, TMP, and temperature are key variables for interpreting NF process. The effects of HAAs properties were ranked as rs > logKow > MW, suggesting significance of size exclusion and hydrophobic interaction. The impact of the operational conditions followed the order pH > TMP > temperature, illustrating that pH was the major influencing operating condition. This study demonstrated significant capacity of ML, which reduced amount of experimental work. In addition, the main operating conditions can be evaluated in terms of their contributions, making ML an efficient tool for risk management and process optimization.

Keywords: Haloacetic acids rejection; Machine learning; Nanofiltration process; SHAP; XGBoost.