Comparison of preprocessing techniques to reduce nontissue-related variations in hyperspectral reflectance imaging

J Biomed Opt. 2022 Oct;27(10):106003. doi: 10.1117/1.JBO.27.10.106003.

Abstract

Significance: Hyperspectral reflectance imaging can be used in medicine to identify tissue types, such as tumor tissue. Tissue classification algorithms are developed based on, e.g., machine learning or principle component analysis. For the development of these algorithms, data are generally preprocessed to remove variability in data not related to the tissue itself since this will improve the performance of the classification algorithm. In hyperspectral imaging, the measured spectra are also influenced by reflections from the surface (glare) and height variations within and between tissue samples.

Aim: To compare the ability of different preprocessing algorithms to decrease variations in spectra induced by glare and height differences while maintaining contrast based on differences in optical properties between tissue types.

Approach: We compare eight preprocessing algorithms commonly used in medical hyperspectral imaging: standard normal variate, multiplicative scatter correction, min-max normalization, mean centering, area under the curve normalization, single wavelength normalization, first derivative, and second derivative. We investigate conservation of contrast stemming from differences in: blood volume fraction, presence of different absorbers, scatter amplitude, and scatter slope-while correcting for glare and height variations. We use a similarity metric, the overlap coefficient, to quantify contrast between spectra. We also investigate the algorithms for clinical datasets from the colon and breast.

Conclusions: Preprocessing reduces the overlap due to glare and distance variations. In general, the algorithms standard normal variate, min-max, area under the curve, and single wavelength normalization are the most suitable to preprocess data used to develop a classification algorithm for tissue classification. The type of contrast between tissue types determines which of these four algorithms is most suitable.

Keywords: cancer; classification; glare; hyperspectral; machine learning; normalization; preprocessing; scatter correction.

MeSH terms

  • Algorithms
  • Hyperspectral Imaging*
  • Principal Component Analysis
  • Spectroscopy, Near-Infrared
  • Support Vector Machine*