Text mining in a literature review of urothelial cancer using topic model

BMC Cancer. 2020 May 24;20(1):462. doi: 10.1186/s12885-020-06931-0.

Abstract

Background: Urothelial cancer (UC) includes carcinomas of the bladder, ureters, and renal pelvis. New treatments and biomarkers of UC emerged in this decade. To identify the key information in a vast amount of literature can be challenging. In this study, we use text mining to explore UC publications to identify important information that may lead to new research directions.

Method: We used topic modeling to analyze the titles and abstracts of 29,883 articles of UC from Pubmed, Web of Science, and Embase in Mar 2020. We applied latent Dirichlet allocation modeling to extract 15 topics and conducted trend analysis. Gene ontology term enrichment analysis and Kyoto encyclopedia of genes and genomes pathway analysis were performed to identify UC related pathways.

Results: There was a growing trend regarding UC treatment especially immune checkpoint therapy but not the staging of UC. The risk factors of UC carried in different countries such as cigarette smoking in the United State and aristolochic acid in Taiwan and China. GMCSF, IL-5, Syndecan-1, ErbB receptor, integrin, c-Met, and TRAIL signaling pathways are the most relevant biological pathway associated with UC.

Conclusions: The risk factors of UC may be dependent on the countries and GMCSF, IL-5, Syndecan-1, ErbB receptor, integrin, c-Met, and TRAIL signaling pathways are the most relevant biological pathway associated with UC. These findings may provide further UC research directions.

Keywords: LDA2vec; Research trends; Text mining; Topic modeling; Urothelial carcinoma.

Publication types

  • Review

MeSH terms

  • Data Mining / statistics & numerical data*
  • Humans
  • Kidney Pelvis / pathology*
  • Models, Theoretical*
  • Prognosis
  • Risk Factors
  • Urologic Neoplasms / diagnosis*
  • Urologic Neoplasms / epidemiology
  • Urologic Neoplasms / therapy*