DSEATM: drug set enrichment analysis uncovering disease mechanisms by biomedical text mining

Brief Bioinform. 2022 Jul 18;23(4):bbac228. doi: 10.1093/bib/bbac228.

Abstract

Disease pathogenesis is always a major topic in biomedical research. With the exponential growth of biomedical information, drug effect analysis for specific phenotypes has shown great promise in uncovering disease-associated pathways. However, this method has only been applied to a limited number of drugs. Here, we extracted the data of 4634 diseases, 3671 drugs, 112 809 disease-drug associations and 81 527 drug-gene associations by text mining of 29 168 919 publications. On this basis, we proposed a 'Drug Set Enrichment Analysis by Text Mining (DSEATM)' pipeline and applied it to 3250 diseases, which outperformed the state-of-the-art method. Furthermore, diseases pathways enriched by DSEATM were similar to those obtained using the TCGA cancer RNA-seq differentially expressed genes. In addition, the drug number, which showed a remarkable positive correlation of 0.73 with the AUC, plays a determining role in the performance of DSEATM. Taken together, DSEATM is an auspicious and accurate disease research tool that offers fresh insights.

Keywords: MESH; disease pathway; drug set enrichment analysis; text mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomedical Research*
  • Data Mining* / methods
  • Phenotype