Finding the evidence for protein-protein interactions from PubMed abstracts

Bioinformatics. 2006 Jul 15;22(14):e220-6. doi: 10.1093/bioinformatics/btl203.

Abstract

Motivation: Protein-protein interactions play critical roles in biological processes, and many biologists try to find or to predict crucial information concerning these interactions. Before verifying interactions in biological laboratory work, validating them from previous research is necessary. Although many efforts have been made to create databases that store verified information in a structured form, much interaction information still remains as unstructured text. As the amount of new publications has increased rapidly, a large amount of research has sought to extract interactions from the text automatically. However, there remain various difficulties associated with the process of applying automatically generated results into manually annotated databases. For interactions that are not found in manually stored databases, researchers attempt to search for abstracts or full papers.

Results: As a result of a search for two proteins, PubMed frequently returns hundreds of abstracts. In this paper, a method is introduced that validates protein-protein interactions from PubMed abstracts. A query is generated from two given proteins automatically and abstracts are then collected from PubMed. Following this, target proteins and their synonyms are recognized and their interaction information is extracted from the collection. It was found that 67.37% of the interactions from DIP-PPI corpus were found from the PubMed abstracts and 87.37% of interactions were found from the given full texts.

Availability: Contact authors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Abstracting and Indexing / methods*
  • Algorithms
  • Artificial Intelligence
  • Evidence-Based Medicine / methods
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Pattern Recognition, Automated
  • Periodicals as Topic
  • Protein Interaction Mapping / methods*
  • Proteins / classification*
  • Proteins / metabolism*
  • PubMed*
  • Vocabulary, Controlled

Substances

  • Proteins