Machine learning-based monosaccharide profiling for tissue-specific classification of Wolfiporia extensa samples

Carbohydr Polym. 2023 Dec 15:322:121338. doi: 10.1016/j.carbpol.2023.121338. Epub 2023 Aug 28.

Abstract

Machine learning (ML) has been used for many clinical decision-making processes and diagnostic procedures in bioinformatics applications. We examined eight algorithms, including linear discriminant analysis (LDA), logistic regression (LR), k-nearest neighbor (KNN), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), Naïve Bayes classifier (NB), and artificial neural network (ANN) models, to evaluate their classification and prediction capabilities for four tissue types in Wolfiporia extensa using their monosaccharide composition profiles. All 8 ML-based models were assessed as exemplary models with AUC exceeding 0.8. Five models, namely LDA, KNN, RF, GBM, and ANN, performed excellently in the four-tissue-type classification (AUC > 0.9). Additionally, all eight models were evaluated as good predictive models with AUC value > 0.8 in the three-tissue-type classification. Notably, all 8 ML-based methods outperformed the single linear discriminant analysis (LDA) plotting method. For large sample sizes, the ML-based methods perform better than traditional regression techniques and could potentially increase the accuracy in identifying tissue samples of W. extensa.

Keywords: Linear discriminant analysis; Machine learning; Predictive model; Tissue-specific classification; Wolfiporia extensa.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Machine Learning
  • Neural Networks, Computer
  • Wolfiporia*