Assessment of lemon juice quality and adulteration by ultra-high performance liquid chromatography/triple quadrupole mass spectrometry with interactive and interpretable machine learning

J Food Drug Anal. 2021 Jun 15;29(2):275-286. doi: 10.38212/2224-6614.3356.

Abstract

A total of 81 lemon juices samples were detected using an optimized UHPLC-QqQ-MS/MS method and colorimetric assays. Concentration of 3 organic acids (ascorbic acid, malic acid and citric acid), 3 saccharides (glucose, fructose and sucrose) and 6 phenolic acids (trans-p-coumaric acid, 3-hydroxybenzoic acid, 4-hydroxybenzoic acid, 3,4-dihydroxybenzoic acid, caffeic acid) were quantified. Their total polyphenol, antioxidant activity and Ferric reducing antioxidant power were also measured. For the prediction of authentic and adulterated lemon juices and commercially sourced lemonade beverages based on the acquired metabolic profile, machine learning models including linear discriminant analysis, Gaussian naïve Bayes, lasso-regularized logistic regression, random forest (RF) and support vector machine were developed based on training (70%)-cross-validation-testing (30%) workflow. The predicted accuracy on the testing set is 73-86% for different models. Individual conditional expectation analysis (how predicted probabilities change when the feature magnitude changes) was applied for model interpretation, which in particular revealed the close association of RF-probability prediction with nuance characteristics of the density distribution of metabolic features. Using established models, an open-source online dashboard was constructed for convenient classification prediction and interactive visualization in real practice.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Beverages* / analysis
  • Chromatography, High Pressure Liquid / methods
  • Machine Learning
  • Tandem Mass Spectrometry* / methods