Handling large datasets of hyperspectral images: reducing data size without loss of useful information

Anal Chim Acta. 2013 Nov 13:802:29-39. doi: 10.1016/j.aca.2013.10.009. Epub 2013 Oct 11.

Abstract

Hyperspectral Imaging (HSI) is gaining increasing interest in the field of analytical chemistry, since this fast and non-destructive technique allows one to easily acquire a large amount of spectral and spatial information on a wide number of samples in very short times. However, the large size of hyperspectral image data often limits the possible uses of this technique, due to the difficulty of evaluating many samples altogether, for example when one needs to consider a representative number of samples for the implementation of on-line applications. In order to solve this problem, we propose a novel chemometric strategy aimed to significantly reduce the dataset size, which allows to analyze in a completely automated way from tens up to hundreds of hyperspectral images altogether, without losing neither spectral nor spatial information. The approach essentially consists in compressing each hyperspectral image into a signal, named hyperspectrogram, which is created by combining several quantities obtained by applying PCA to each single hyperspectral image. Hyperspectrograms can then be used as a compact set of descriptors and subjected to blind analysis techniques. Moreover, a further improvement of both data compression and calibration/classification performances can be achieved by applying proper variable selection methods to the hyperspectrograms. A visual evaluation of the correctness of the choices made by the algorithm can be obtained by representing the selected features back into the original image domain. Likewise, the interpretation of the chemical information underlying the selected regions of the hyperspectrograms related to the loadings is enabled by projecting them in the original spectral domain. Examples of applications of the hyperspectrogram-based approach to hyperspectral images of food samples in the NIR range (1000-1700 nm) and in the vis-NIR range (400-1000 nm), facing a calibration and a defect detection issue respectively, demonstrate the effectiveness of the proposed approach.

Keywords: Data compression; Hyperspectral imaging; Multivariate image analysis; Variable selection.

MeSH terms

  • Algorithms
  • Calibration
  • Databases, Factual*
  • Image Processing, Computer-Assisted*
  • Spectroscopy, Near-Infrared*