A Semantic Analysis and Community Detection-Based Artificial Intelligence Model for Core Herb Discovery from the Literature: Taking Chronic Glomerulonephritis Treatment as a Case Study

Comput Math Methods Med. 2020 Sep 1:2020:1862168. doi: 10.1155/2020/1862168. eCollection 2020.

Abstract

The Traditional Chinese Medicine (TCM) formula is the main treatment method of TCM. A formula often contains multiple herbs where core herbs play a critical therapeutic effect for treating diseases. It is of great significance to find out the core herbs in formulae for providing evidences and references for the clinical application of Chinese herbs and formulae. In this paper, we propose a core herb discovery model CHDSC based on semantic analysis and community detection to discover the core herbs for treating a certain disease from large-scale literature, which includes three stages: corpus construction, herb network establishment, and core herb discovery. In CHDSC, two artificial intelligence modules are used, where the Chinese word embedding algorithm ESSP2VEC is designed to analyse the semantics of herbs in Chinese literature based on the stroke, structure, and pinyin features of Chinese characters, and the label propagation-based algorithm LILPA is adopted to detect herb communities and core herbs in the herbal semantic network constructed from large-scale literature. To validate the proposed model, we choose chronic glomerulonephritis (CGN) as an example, search 1126 articles about how to treat CGN in TCM from the China National Knowledge Infrastructure (CNKI), and apply CHDSC to analyse the collected literature. Experimental results reveal that CHDSC discovers three major herb communities and eighteen core herbs for treating different CGN syndromes with high accuracy. The community size, degree, and closeness centrality distributions of the herb network are analysed to mine the laws of core herbs. As a result, we can observe that core herbs mainly exist in the communities with more than 25 herbs. The degree and closeness centrality of core herb nodes concentrate on the range of [15, 40] and [0.25, 0.45], respectively. Thus, semantic analysis and community detection are helpful for mining effective core herbs for treating a certain disease from large-scale literature.

Publication types

  • Validation Study

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • China
  • Chronic Disease
  • Computational Biology
  • Data Mining
  • Databases, Pharmaceutical
  • Drug Discovery / methods*
  • Drug Discovery / statistics & numerical data
  • Drugs, Chinese Herbal / classification*
  • Drugs, Chinese Herbal / therapeutic use*
  • Glomerulonephritis / drug therapy*
  • Humans
  • Mathematical Concepts
  • Medicine, Chinese Traditional / methods
  • Medicine, Chinese Traditional / statistics & numerical data
  • Phytotherapy*
  • Semantics

Substances

  • Drugs, Chinese Herbal