Machine learning prediction of lignin content in poplar with Raman spectroscopy

Wenli Gao; Liang Zhou; Shengquan Liu; Ying Guan; Hui Gao; Bin Hui

doi:10.1016/j.biortech.2022.126812

Machine learning prediction of lignin content in poplar with Raman spectroscopy

Bioresour Technol. 2022 Mar:348:126812. doi: 10.1016/j.biortech.2022.126812. Epub 2022 Feb 4.

Authors

Wenli Gao¹, Liang Zhou², Shengquan Liu¹, Ying Guan¹, Hui Gao¹, Bin Hui³

Affiliations

¹ School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China.
² School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China. Electronic address: mcyjs1@ahau.edu.cn.
³ State Key Laboratory of Bio-Fibers and Eco-Textiles, Institute of Marine Biobased Materials, School of Materials Science and Engineering, Qingdao University, Qingdao 266071, PR China.

PMID: 35131461
DOI: 10.1016/j.biortech.2022.126812

Abstract

Based on features extracted from Raman spectra, regularization algorithms, SVR, DT, RF, LightGBM, CatBoost, and XGBoost were used to develop prediction models for lignin content in poplar. Firstly, Raman features extracted from FT-Raman spectra after data processing were used as input of models and determined lignin contents were output. Secondly, grid-search combined with cross-validation was used to adjust the hyper-parameters of models. Finally, the predictive models were built by aforementioned algorithms. The results indicated regularization algorithms, SVR, DT held test R² were >0.80 which means the predictive values from model still deviate from measured ones. Meanwhile, RF, LightGBM, CatBoost, and XGBoost were better than above algorithms, and their test R² were >0.91 which suggesting the predictive values was nearly close to measured ones. Therefore, fast and accurate methods for predicting lignin content were obtained and will be useful for screening suitable lignocellulosic resource with expected lignin content.

Keywords: Gradient boosting machine; Lignin content; Machine learning; Raman spectroscopy; XGBoost.

MeSH terms

Algorithms
Lignin*
Machine Learning
Populus*
Spectrum Analysis, Raman

Substances

Lignin