Identify production area, growth mode, species, and grade of Astragali Radix using metabolomics "big data" and machine learning

Phytomedicine. 2024 Jan:123:155201. doi: 10.1016/j.phymed.2023.155201. Epub 2023 Nov 8.

Abstract

Background: Astragali Radix (AR) is a widely used herbal medicine. The quality of AR is influenced by several key factors, including the production area, growth mode, species, and grade. However, the markers currently used to distinguish these factors primarily focus on secondary metabolites, and their validation on large-scale samples is lacking.

Purpose: This study aims to discover reliable markers and develop classification models for identifying the production area, growth mode, species, and grade of AR.

Methods: A total of 366 batches of AR crude slices were collected from six provinces in China and divided into learning (n = 191) and validation (n = 175) sets. Three ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) methods were developed and validated for determining 22 primary and 10 secondary metabolites in AR methanol extract. Based on the quantification data, seven machine learning algorithms, such as Nearest Neighbors and Gradient Boosted Trees, were applied to screen the potential markers and build the classification models for identifying the four factors associated with AR quality.

Results: Our analysis revealed that secondary metabolites (e.g., astragaloside IV, calycosin-7-O-β-D-glucoside, and ononin) played a crucial role in evaluating AR quality, particularly in identifying the production area and species. Additionally, fatty acids (e.g., behenic acid and lignoceric acid) were vital in determining the growth mode of AR, while amino acids (e.g., alanine and phenylalanine) were helpful in distinguishing different grades. With both primary and secondary metabolites, the Nearest Neighbors algorithm-based model was constructed for identifying each factor of AR, achieving good classification accuracy (>70%) on the validation set. Furthermore, a panel of four metabolites including ononin, astragaloside II, pentadecanoic acid, and alanine, allowed for simultaneous identification of all four factors of AR, offering an accuracy of 86.9%.

Conclusion: Our findings highlight the potential of integrating large-scale targeted metabolomics and machine learning approaches to accurately identify the quality-associated factors of AR. This study opens up possibilities for enhancing the evaluation of other herbal medicines through similar methodologies, and further exploration in this area is warranted.

Keywords: Astragalus; Classification; Quality marker; Targeted metabolomics; UPLC-MS/MS.

MeSH terms

  • Alanine
  • Astragalus Plant*
  • Astragalus propinquus / chemistry
  • Chromatography, High Pressure Liquid / methods
  • Chromatography, Liquid
  • Drugs, Chinese Herbal* / pharmacology
  • Tandem Mass Spectrometry / methods

Substances

  • Drugs, Chinese Herbal
  • Alanine