Optimizing the Quality of Machine Learning for Identifying the Share of Biogenic and Fossil Carbon in Solid Waste

Anal Chem. 2023 Mar 7;95(9):4412-4420. doi: 10.1021/acs.analchem.2c04940. Epub 2023 Feb 23.

Abstract

Insights into carbon sources (biogenic and fossil carbon) and contents in solid waste are vital for estimating the carbon emissions from incineration plants. However, the traditional methods are time-, labor-, and cost-intensive. Herein, high-quality data sets were established after analyzing the carbon contents and infrared spectra of substantial samples using elemental analysis and attenuated total reflectance-Fourier transform infrared spectroscopy (ATR-FTIR), respectively. Then, five classification and eight regression machine learning (ML) models were evaluated to recognize the proportion of biogenic and fossil carbon in solid waste. Using the optimized data preprocessing approach, the random forest (RF) classifier with hyperparameter tuning ranked first in classifying the carbon group with a test accuracy of 0.969, and the carbon contents were successfully predicted by the RF regressor with R2 = 0.926 considering performance-interpretability-computation time competition. The above proposed algorithms were further validated with real environmental samples, which exhibited robust performance with an accuracy of 0.898 for carbon group classification and an R2 value of 0.851 for carbon content prediction. The reliable results indicate that ATR-FTIR coupled with ML algorithms is feasible for rapidly identifying both carbon groups and content, facilitating the calculation and assessment of carbon emissions from solid waste incineration.