AI-assisted literature exploration of innovative Chinese medicine formulas

Front Pharmacol. 2024 Mar 22:15:1347882. doi: 10.3389/fphar.2024.1347882. eCollection 2024.

Abstract

Objective: Our study provides an innovative approach to exploring herbal formulas that contribute to the promotion of sustainability and biodiversity conservation. We employ data mining, integrating keyword extraction, association rules, and LSTM-based generative models to analyze classical Traditional Chinese Medicine (TCM) texts. We systematically decode classical Chinese medical literature, conduct statistical analyses, and link these historical texts with modern pharmacogenomic references to explore potential alternatives. Methods: We present a novel iterative keyword extraction approach for discerning diverse herbs in historical TCM texts from the Pu-Ji Fang copies. Utilizing association rules, we uncover previously unexplored herb pairs. To bridge classical TCM herbal pairs with modern genetic relationships, we conduct gene-herb searches in PubMed and statistically validate this genetic literature as supporting evidence. We have expanded on the present work by developing a generative language model for suggesting innovative TCM formulations based on textual herb combinations. Results: We collected associations with 7,664 PubMed cross-search entries for gene-herb and 934 for Shenqifuzheng Injection as a positive control. We analyzed 16,384 keyword combinations from Pu-Ji Fang's 426 volumes, employing statistical methods to probe gene-herb associations, focusing on examining differences among the target genes and Pu-Ji Fang herbs. Conclusion: Analyzing Pu-Ji Fang reveals a historical focus on flavor over medicinal aspects in TCM. We extend our work on developing a generative model from classical textual keywords to rapidly produces novel herbal compositions or TCM formulations. This integrated approach enhances our comprehension of TCM by merging ancient text analysis, modern genetic research, and generative modeling.

Keywords: TCM; TCM LSTM generative model; extraction; text annotation tool; text mining.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The study has been supported by a National Science and Technology Council (NSTC) research grant (112-2221-E-008-079) in Taiwan. The URL of the funder’s website is (https://www.nstc.gov.tw/). The funders’ involvement was limited to financial support, with no contribution to the formulation of study methodology, the gathering and interpretation of data, the determination of publication, or the composition of the manuscript.