Mining and evaluation of molecular relationships in literature

Bioinformatics. 2012 Mar 1;28(5):709-14. doi: 10.1093/bioinformatics/bts026. Epub 2012 Jan 13.

Abstract

Motivation: Specific information on newly discovered proteins is often difficult to find in literature. Particularly if only sequences and no common names of proteins or genes are available, preceding sequence similarity searches can be crucial for the process of information collection. In drug research, it is important to know whether a small molecule targets only one specific protein or whether similar or homologous proteins are also influenced that may account for possible side effects.

Results: prolific (protein-literature investigation for interacting compounds) provides a one-step solution to investigate available information on given protein names, sequences, similar proteins or sequences on the gene level. Co-occurrences of UniProtKB/Swiss-Prot proteins and PubChem compounds in all PubMed abstracts are retrievable. Concise 'heat-maps' and tables display frequencies of co-occurrences. They provide links to processed literature with highlighted found protein and compound synonyms. Evaluation with manually curated drug-protein relationships showed that up to 69% could be discovered by automatic text-processing. Examples are presented to demonstrate the capabilities of prolific.

Availability: The web-application is available at http://prolific.pharmaceutical-bioinformatics.de and a web service at http://www.pharmaceutical-bioinformatics.de/prolific/soap/prolific.wsdl.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining*
  • Databases, Protein*
  • Drug Discovery
  • Internet
  • Proteins / metabolism
  • PubMed

Substances

  • Proteins