Effect of variable selection algorithms on model performance for predicting moisture content in biological materials using spectral data

Anal Chim Acta. 2022 Apr 15:1202:339390. doi: 10.1016/j.aca.2021.339390. Epub 2021 Dec 21.

Abstract

Variable selection is a critical step for designing a dedicated multispectral real-time system from multicollinearity spectral data. It improves the prediction ability of the calibration model and provides faster prediction by reducing the curse of dimensionality. The main objective of this study was to compare the effect of variables selection algorithms on model performance for predicting moisture content in red meat using visible and near-infrared (VNIR) hyperspectral imaging in the spectral range of 400-1000 nm and corn using near-infrared (NIR) spectroscopy in the spectral range of 1100-2498 nm. Six variable selection algorithms including the size of the regression coefficient (RC), variable importance in projection (VIP), genetic algorithm (GA), competitive adaptive reweighted sampling (CARS), successive projection algorithm (SPA), and stepwise regression (SWR) were tested and compared to realize their effects on the model performance for predicting moisture content in red meat and corn. The model based on competitive adaptive reweighted sampling-partial least squares regression (CARS-PLSR) was the best model to predict moisture content in red meat and corn. The results indicated the effectiveness of variable selection for providing the feature wavelengths to design a low-cost, real-time multispectral system.

Keywords: Corn; Hyperspectral imaging; Meat; Moisture content; Spectroscopy; Variable selection.

MeSH terms

  • Algorithms
  • Least-Squares Analysis
  • Red Meat*
  • Spectroscopy, Near-Infrared* / methods
  • Zea mays