Machine Learning Analysis of Raman Spectra To Quantify the Organic Constituents in Complex Organic-Mineral Mixtures

Anal Chem. 2023 Oct 31;95(43):15908-15916. doi: 10.1021/acs.analchem.3c02348. Epub 2023 Sep 12.

Abstract

Important decisions in local agricultural policy and practice often hinge on the soil's chemical composition. Raman spectroscopy offers a rapid noninvasive means to quantify the constituents of complex organic systems. But the application of Raman spectroscopy to soils presents a multifaceted challenge due to organic/mineral compositional complexity and spectral interference arising from overwhelming fluorescence. The present work compares methodologies with the capacity to help overcome common obstacles that arise in the analysis of soils. We created conditions representative of these challenges by combining varying proportions of six amino acids commonly found in soils with fluorescent bentonite clay and coarse mineral components. Referring to an extensive data set of Raman spectra, we compare the performance of the convolutional neural network (CNN) and partial least-squares regression (PLSR) multivariate models for amino acid composition. Strategies employing volume-averaged spectral sampling and data preprocessing algorithms improve the predictive power of these models. Our average test R2 for PLSR models exceeds 0.89 and approaches 0.98, depending on the complexity of the matrix, whereas CNN yields an R2 range from 0.91 to 0.97, demonstrating that classic PLSR and CNN perform comparably, except in cases where the signal-to-noise ratio of the organic component is very low, whereupon CNN models outperform. Artificially isolating two of the most prevalent obstacles in evaluating the Raman spectra of soils, we have characterized the effect of each obstacle on the performance of machine learning models in the absence of other complexities. These results highlight important considerations and modeling strategies necessary to improve the Raman analysis of organic compounds in complex mixtures in the presence of mineral spectral components and significant fluorescence.