Gelsius: a literature-based workflow for determining quantitative associations between genes and biological processes

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):619-31. doi: 10.1109/TCBB.2013.11.

Abstract

An effective knowledge extraction and quantification methodology from biomedical literature would allow the researcher to organize and analyze the results of high-throughput experiments on microarrays and next-generation sequencing technologies. Despite the large amount of raw information available on the web, a tool able to extract a measure of the correlation between a list of genes and biological processes is not yet available. In this paper, we present Gelsius, a workflow that incorporates biomedical literature to quantify the correlation between genes and terms describing biological processes. To achieve this target, we build different modules focusing on query expansion and document cononicalization. In this way, we reached to improve the measurement of correlation, performed using a latent semantic analysis approach. To the best of our knowledge, this is the first complete tool able to extract a measure of genes-biological processes correlation from literature. We demonstrate the effectiveness of the proposed workflow on six biological processes and a set of genes, by showing that correlation results for known relationships are in accordance with definitions of gene functions provided by NCI Thesaurus. On the other side, the tool is able to propose new candidate relationships for later experimental validation. The tool is available at >http://bioeda1.polito.it:8080/medSearchServlet/.

MeSH terms

  • Data Mining / methods*
  • Databases, Genetic
  • Gene Ontology*
  • Genes / genetics
  • Genes / physiology
  • Genomics / methods*
  • Humans
  • Software*
  • Unified Medical Language System*