Machine learning prediction of lignin content in poplar with Raman spectroscopy

Bioresour Technol. 2022 Mar:348:126812. doi: 10.1016/j.biortech.2022.126812. Epub 2022 Feb 4.

Abstract

Based on features extracted from Raman spectra, regularization algorithms, SVR, DT, RF, LightGBM, CatBoost, and XGBoost were used to develop prediction models for lignin content in poplar. Firstly, Raman features extracted from FT-Raman spectra after data processing were used as input of models and determined lignin contents were output. Secondly, grid-search combined with cross-validation was used to adjust the hyper-parameters of models. Finally, the predictive models were built by aforementioned algorithms. The results indicated regularization algorithms, SVR, DT held test R2 were >0.80 which means the predictive values from model still deviate from measured ones. Meanwhile, RF, LightGBM, CatBoost, and XGBoost were better than above algorithms, and their test R2 were >0.91 which suggesting the predictive values was nearly close to measured ones. Therefore, fast and accurate methods for predicting lignin content were obtained and will be useful for screening suitable lignocellulosic resource with expected lignin content.

Keywords: Gradient boosting machine; Lignin content; Machine learning; Raman spectroscopy; XGBoost.

MeSH terms

  • Algorithms
  • Lignin*
  • Machine Learning
  • Populus*
  • Spectrum Analysis, Raman

Substances

  • Lignin