Sparse wavelengths data in mid-infrared spectroscopy: Modelling approaches and channel sampling

J Biophotonics. 2023 Oct;16(10):e202300049. doi: 10.1002/jbio.202300049. Epub 2023 Jul 21.

Abstract

Infrared instruments with smaller and cost-effective components such as bandpass filters, single channel detectors, and laser-based light sources are being developed to provide cheaper and faster analysis of biological samples. Such instruments often provide measurements in form of sparse data, which include a collection of single-frequency channels or a collection of channels covering very narrow spectral ranges, called here multi-frequency channels. To keep costs low, the number of channels needs to be kept at a minimum. However, modelling and preprocessing of sparse data needs enough channels to perform the task. The aim of this study therefore was to understand the effect of channels sampling on data modelling results and find optimal modelling algorithm for different type of sparse data. The sparse data was simulated using Fourier Transform Infrared spectra of milk and fungi. Regression models were established to predict fatty acid composition by partial least squares regression (PLSR), multiple linear regression (MLR) and random forest (RF) methods. We observe that PLSR algorithm is very well suited for sparse data such as multi-frequency channels: excellent calibration models were obtained with only three channels comprising three wavenumbers each. The results were comparable to results obtained with full spectra. MLR and RF in turn provided similarly good results using data with single-frequency channels requiring nine channels in total.

Keywords: infrared lasers; machine learning; pre-processing; sparse spectra data.