Data dimensionality reduction and data fusion for fast characterization of green coffee samples using hyperspectral sensors

Anal Bioanal Chem. 2016 Oct;408(26):7351-66. doi: 10.1007/s00216-016-9713-7. Epub 2016 Jun 24.

Abstract

Hyperspectral sensors represent a powerful tool for chemical mapping of solid-state samples, since they provide spectral information localized in the image domain in very short times and without the need of sample pretreatment. However, due to the large data size of each hyperspectral image, data dimensionality reduction (DR) is necessary in order to develop hyperspectral sensors for real-time monitoring of large sets of samples with different characteristics. In particular, in this work, we focused on DR methods to convert the three-dimensional data array corresponding to each hyperspectral image into a one-dimensional signal (1D-DR), which retains spectral and/or spatial information. In this way, large datasets of hyperspectral images can be converted into matrices of signals, which in turn can be easily processed using suitable multivariate statistical methods. Obviously, different 1D-DR methods highlight different aspects of the hyperspectral image dataset. Therefore, in order to investigate their advantages and disadvantages, in this work, we compared three different 1D-DR methods: average spectrum (AS), single space hyperspectrogram (SSH) and common space hyperspectrogram (CSH). In particular, we have considered 370 NIR-hyperspectral images of a set of green coffee samples, and the three 1D-DR methods were tested for their effectiveness in sensor fault detection, data structure exploration and sample classification according to coffee variety and to coffee processing method. Principal component analysis and partial least squares-discriminant analysis were used to compare the three separate DR methods. Furthermore, low-level and mid-level data fusion was also employed to test the advantages of using AS, SSH and CSH altogether. Graphical Abstract Key steps in hyperspectral data dimenionality reduction.

Keywords: Data dimensionality reduction; Data fusion; Fast exploration; Green coffee; Hyperspectral imaging; Multivariate classification.

MeSH terms

  • Coffee / chemistry*
  • Data Compression / methods*
  • Discriminant Analysis
  • Image Processing, Computer-Assisted / methods*
  • Least-Squares Analysis
  • Principal Component Analysis
  • Spectroscopy, Near-Infrared / methods*

Substances

  • Coffee