Reduction of the Number of Samples for Cost-Effective Hyperspectral Grape Quality Predictive Models

Foods. 2021 Jan 23;10(2):233. doi: 10.3390/foods10020233.

Abstract

Developing chemometric models from near-infrared (NIR) spectra requires the use of a representative calibration set of the entire population. Therefore, generally, the calibration procedure requires a large number of resources. For that reason, there is a great interest in identifying the most spectrally representative samples within a large population set. In this study, principal component and hierarchical clustering analyses have been compared for their ability to provide different representative calibration sets. The calibration sets generated have been used to control the technological maturity of grapes and total phenolic compounds of grape skins in red and white cultivars. Finally, the accuracy and precision of the models obtained with these calibration sets resulted from the application of the selection algorithms studied have been compared with each other and with the whole set of samples using an external validation set. Most of the standard errors of prediction (SEP) in external validation obtained from the reduced data sets were not significantly different from those obtained using the whole data set. Moreover, sample subsets resulting from hierarchical clustering analysis appear to produce slightly better results.

Keywords: chemometrics; grape quality; hyperspectral imaging; near-infrared; sample selection.