Feature selection of infrared spectra analysis with convolutional neural network

Spectrochim Acta A Mol Biomol Spectrosc. 2022 Feb 5:266:120361. doi: 10.1016/j.saa.2021.120361. Epub 2021 Sep 4.

Abstract

Data-driven deep learning analysis, especially for convolution neural network (CNN), has been developed and successfully applied in many domains. CNN is regarded as a black box, and the main drawback is the lack of interpretation. In this study, an interpretable CNN model was presented for infrared data analysis. An ascending stepwise linear regression (ASLR)-based approach was leveraged to extract the informative neurons in the flatten layer from the trained model. The characteristic of CNN network was employed to visualize the active variables according to the extracted neurons. Partial least squares (PLS) model was presented for comparison on the performance of extracted features and model interpretation. The CNN models yielded accuracies with extracted features of 93.27%, 97.50% and 96.65% for Tablet, meat, and juice datasets on the test set, while the PLS-DA models obtained accuracies with latent variables (LVs) of 95.19%, 95.50% and 98.17%. Both the CNN and PLS models demonstrated the stable patterns on active variables. The repeatability of CNN model and proposed strategies were verified by conducting the Monte-Carlo cross-validation.

Keywords: CNN; Feature extraction; PLS; Variables selection.

MeSH terms

  • Least-Squares Analysis
  • Monte Carlo Method
  • Neural Networks, Computer*
  • Spectrophotometry, Infrared