Comparing Machine Learning and PLSDA Algorithms for Durian Pulp Classification Using Inline NIR Spectra

Sensors (Basel). 2023 Jun 4;23(11):5327. doi: 10.3390/s23115327.

Abstract

The aim of this study was to evaluate and compare the performance of multivariate classification algorithms, specifically Partial Least Squares Discriminant Analysis (PLS-DA) and machine learning algorithms, in the classification of Monthong durian pulp based on its dry matter content (DMC) and soluble solid content (SSC), using the inline acquisition of near-infrared (NIR) spectra. A total of 415 durian pulp samples were collected and analyzed. Raw spectra were preprocessed using five different combinations of spectral preprocessing techniques: Moving Average with Standard Normal Variate (MA+SNV), Savitzky-Golay Smoothing with Standard Normal Variate (SG+SNV), Mean Normalization (SG+MN), Baseline Correction (SG+BC), and Multiplicative Scatter Correction (SG+MSC). The results revealed that the SG+SNV preprocessing technique produced the best performance with both the PLS-DA and machine learning algorithms. The optimized wide neural network algorithm of machine learning achieved the highest overall classification accuracy of 85.3%, outperforming the PLS-DA model, with overall classification accuracy of 81.4%. Additionally, evaluation metrics such as recall, precision, specificity, F1-score, AUC ROC, and kappa were calculated and compared between the two models. The findings of this study demonstrate the potential of machine learning algorithms to provide similar or better performance compared to PLS-DA in classifying Monthong durian pulp based on DMC and SSC using NIR spectroscopy, and they can be applied in the quality control and management of durian pulp production and storage.

Keywords: Monthong durian; NIR spectroscopy; Partial Least Squares Discriminant Analysis (PLS-DA); dry matter content; machine learning; multivariate classification algorithms; neural network; soluble solid content.

MeSH terms

  • Algorithms
  • Bombacaceae*
  • Least-Squares Analysis
  • Neural Networks, Computer
  • Spectroscopy, Near-Infrared / methods
  • Support Vector Machine