Mining and evaluation of molecular relationships in literature

Christian Senger; Björn A Grüning; Anika Erxleben; Kersten Döring; Hitesh Patel; Stephan Flemming; Irmgard Merfort; Stefan Günther

doi:10.1093/bioinformatics/bts026

Mining and evaluation of molecular relationships in literature

Bioinformatics. 2012 Mar 1;28(5):709-14. doi: 10.1093/bioinformatics/bts026. Epub 2012 Jan 13.

Authors

Christian Senger¹, Björn A Grüning, Anika Erxleben, Kersten Döring, Hitesh Patel, Stephan Flemming, Irmgard Merfort, Stefan Günther

Affiliation

¹ Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs-University, Hermann-Herder-Str. 9, D-79104 Freiburg, Germany.

PMID: 22247277
DOI: 10.1093/bioinformatics/bts026

Abstract

Motivation: Specific information on newly discovered proteins is often difficult to find in literature. Particularly if only sequences and no common names of proteins or genes are available, preceding sequence similarity searches can be crucial for the process of information collection. In drug research, it is important to know whether a small molecule targets only one specific protein or whether similar or homologous proteins are also influenced that may account for possible side effects.

Results: prolific (protein-literature investigation for interacting compounds) provides a one-step solution to investigate available information on given protein names, sequences, similar proteins or sequences on the gene level. Co-occurrences of UniProtKB/Swiss-Prot proteins and PubChem compounds in all PubMed abstracts are retrievable. Concise 'heat-maps' and tables display frequencies of co-occurrences. They provide links to processed literature with highlighted found protein and compound synonyms. Evaluation with manually curated drug-protein relationships showed that up to 69% could be discovered by automatic text-processing. Examples are presented to demonstrate the capabilities of prolific.

Availability: The web-application is available at http://prolific.pharmaceutical-bioinformatics.de and a web service at http://www.pharmaceutical-bioinformatics.de/prolific/soap/prolific.wsdl.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Mining*
Databases, Protein*
Drug Discovery
Internet
Proteins / metabolism
PubMed

Substances

Proteins