Employing fingerprinting of medicinal plants by means of LC-MS and machine learning for species identification task

Sci Rep. 2018 Nov 19;8(1):17053. doi: 10.1038/s41598-018-35399-z.

Abstract

A dataset of liquid chromatography-mass spectrometry measurements of medicinal plant extracts from 74 species was generated and used for training and validating plant species identification algorithms. Various strategies for data handling and feature space extraction were tested. Constrained Tucker decomposition, large-scale (more than 1500 variables) discrete Bayesian Networks and autoencoder based dimensionality reduction coupled with continuous Bayes classifier and logistic regression were optimized to achieve the best accuracy. Even with elimination of all retention time values accuracies of up to 96% and 92% were achieved on validation set for plant species and plant organ identification respectively. Benefits and drawbacks of used algortihms were discussed. Preliminary test showed that developed approaches exhibit tolerance to changes in data created by using different extraction methods and/or equipment. Dataset with more than 2200 chromatograms was published in an open repository.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromatography, Liquid / methods*
  • Machine Learning*
  • Mass Spectrometry / methods*
  • Plants, Medicinal / classification*
  • Species Specificity