Chemometric analysis in Raman spectroscopy from experimental design to machine learning-based modeling

Nat Protoc. 2021 Dec;16(12):5426-5459. doi: 10.1038/s41596-021-00620-3. Epub 2021 Nov 5.

Abstract

Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning-based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Calibration
  • Chemometrics / instrumentation
  • Chemometrics / methods*
  • Data Interpretation, Statistical
  • Datasets as Topic
  • Humans
  • Machine Learning*
  • Mice
  • Models, Statistical*
  • Principal Component Analysis
  • Reference Standards
  • Spectrum Analysis, Raman / instrumentation
  • Spectrum Analysis, Raman / methods
  • Spectrum Analysis, Raman / standards*