A comparative UHPLC-Q/TOF-MS-based metabolomics approach coupled with machine learning algorithms to differentiate Keemun black teas from narrow-geographic origins

Food Res Int. 2022 Aug:158:111512. doi: 10.1016/j.foodres.2022.111512. Epub 2022 Jun 22.

Abstract

Geographic-label is a remarkable feature for Chinese tea products. In this study, the UHPLC-Q/TOF-MS-based metabolomics approach coupled with chemometrics was used to determine the five narrow-geographic origins of Keemun black tea. Thirty-nine differentiated compounds (VIP > 1) were identified, of which eight were quantified. Chemometric analysis revealed that the linear discriminant analysis (LDA) classification accuracy model is 91.7%, with 84.7% cross-validation accuracy. Three machine learning algorithms, namely feedforward neural network (FNN), random forest (RF) and support vector machine (SVM), were introduced to improve the recognition of narrow-geographic origins, the performances of the model were evaluated by confusion matrix, receiver operating characteristic curve (ROC) and area under the curve (AUC). The recognition of RF, SVM and FNN for Keemun black tea from five narrow-geographic origins were 87.5%, 94.44%, and 100%, respectively. Importantly, FNN exhibited an excellent classification effect with 100% accuracy. The results indicate that metabolomics fingerprints coupled with chemometrics can be used to authenticate the narrow-geographic origins of Keemun black teas.

Keywords: Camellia sinensis tea; Machine learning algorithms; Metabolomics fingerprints; Narrow-geographic origin; Phenolic compounds.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Camellia sinensis*
  • Chromatography, High Pressure Liquid
  • Machine Learning
  • Metabolomics
  • Tea*

Substances

  • Tea