OmixLitMiner: A Bioinformatics Tool for Prioritizing Biological Leads from 'Omics Data Using Literature Retrieval and Data Mining

Int J Mol Sci. 2020 Feb 19;21(4):1374. doi: 10.3390/ijms21041374.

Abstract

Proteomics and genomics discovery experiments generate increasingly large result tables, necessitating more researcher time to convert the biological data into new knowledge. Literature review is an important step in this process and can be tedious for large scale experiments. An informed and strategic decision about which biomolecule targets should be pursued for follow-up experiments thus remains a considerable challenge. To streamline and formalise this process of literature retrieval and analysis of discovery based 'omics data and as a decision-facilitating support tool for follow-up experiments we present OmixLitMiner, a package written in the computational language R. The tool automates the retrieval of literature from PubMed based on UniProt protein identifiers, gene names and their synonyms, combined with user defined contextual keyword search (i.e., gene ontology based). The search strategy is programmed to allow either strict or more lenient literature retrieval and the outputs are assigned to three categories describing how well characterized a regulated gene or protein is. The category helps to meet a decision, regarding which gene/protein follow-up experiments may be performed for gaining new knowledge and to exclude following already known biomarkers. We demonstrate the tool's usefulness in this retrospective study assessing three cancer proteomics and one cancer genomics publication. Using the tool, we were able to corroborate most of the decisions in these papers as well as detect additional biomolecule leads that may be valuable for future research.

Keywords: bioinformatics; data mining; genomics; literature retrieval; mass spectrometry; proteomics.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Data Mining / methods*
  • Databases, Factual
  • Gene Ontology
  • Genomics
  • Humans
  • Mass Spectrometry
  • Neoplasms / genetics
  • Neoplasms / metabolism
  • Proteome / genetics
  • Proteome / metabolism
  • Proteomics
  • PubMed
  • Publications
  • Retrospective Studies
  • Software*

Substances

  • Proteome