A Machine Learning Approach for Recommending Herbal Formulae with Enhanced Interpretability and Applicability

Biomolecules. 2022 Oct 31;12(11):1604. doi: 10.3390/biom12111604.

Abstract

Herbal formulae (HFs) are representative interventions in Korean medicine (KM) for the prevention and treatment of various diseases. Here, we proposed a machine learning-based approach for HF recommendation with enhanced interpretability and applicability. A dataset consisting of clinical symptoms, Sasang constitution (SC) types, and prescribed HFs was derived from a multicenter study. Case studies published over 10 years were collected and curated by experts. Various classifiers, oversampling methods, and data imputation techniques were comprehensively considered. The local interpretable model-agnostic explanation (LIME) technique was applied to identify the clinical symptoms that led to the recommendation of specific HFs. We found that the cascaded deep forest (CDF) model with data imputation and oversampling yielded the best performance on the training set and holdout test set. Our model also achieved top-1 and top-3 accuracies of 0.35 and 0.89, respectively, on case study datasets in which clinical symptoms were only partially recorded. We performed an expert evaluation on the reliability of interpretation results using case studies and achieved a score close to normal. Taken together, our model will contribute to the modernization of KM and the identification of an HF selection process through the development of a practically useful HF recommendation model.

Keywords: Korean medicine; LIME; herbal formula; recommendation model.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning*
  • Reproducibility of Results

Grants and funding

This work was supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (no. 2022R1I1A2066653 and 2020R1F1A1065731) and by B.I.G. project of Korea Institute of Oriental Medicine (KIOM) (no. KSN2023120).