Feature Reduction for the Classification of Bruise Damage to Apple Fruit Using a Contactless FT-NIR Spectroscopy with Machine Learning

Foods. 2023 Jan 3;12(1):210. doi: 10.3390/foods12010210.

Abstract

Spectroscopy data are useful for modelling biological systems such as predicting quality parameters of horticultural products. However, using the wide spectrum of wavelengths is not practical in a production setting. Such data are of high dimensional nature and they tend to result in complex models that are not easily understood. Furthermore, collinearity between different wavelengths dictates that some of the data variables are redundant and may even contribute noise. The use of variable selection methods is one efficient way to obtain an optimal model, andthis was the aim of this work. Taking advantage of a non-contact spectrometer, near infrared spectral data in the range of 800-2500 nm were used to classify bruise damage in three apple cultivars, namely 'Golden Delicious', 'Granny Smith' and 'Royal Gala'. Six prominent machine learning classification algorithms were employed, and two variable selection methods were used to determine the most relevant wavelengths for the problem of distinguishing between bruised and non-bruised fruit. The selected wavelengths clustered around 900 nm, 1300 nm, 1500 nm and 1900 nm. The best results were achieved using linear regression and support vector machine based on up to 40 wavelengths: these methods reached precision values in the range of 0.79-0.86, which were all comparable (within error bars) to a classifier based on the entire range of frequencies. The results also provided an open-source based framework that is useful towards the development of multi-spectral applications such as rapid grading of apples based on mechanical damage, and it can also be emulated and applied for other types of defects on fresh produce.

Keywords: apples; baseline; bruise damage; defect classification; feature reduction; machine learning; model optimisation; quality control; uncertainty quantification; variable selection.