Identifying novel protein phenotype annotations by hybridizing protein-protein interactions and protein sequence similarities

Mol Genet Genomics. 2016 Apr;291(2):913-34. doi: 10.1007/s00438-015-1157-9. Epub 2016 Jan 4.

Abstract

Studies of protein phenotypes represent a central challenge of modern genetics in the post-genome era because effective and accurate investigation of protein phenotypes is one of the most critical procedures to identify functional biological processes in microscale, which involves the analysis of multifactorial traits and has greatly contributed to the development of modern biology in the post genome era. Therefore, we have developed a novel computational method that identifies novel proteins associated with certain phenotypes in yeast based on the protein-protein interaction network. Unlike some existing network-based computational methods that identify the phenotype of a query protein based on its direct neighbors in the local network, the proposed method identifies novel candidate proteins for a certain phenotype by considering all annotated proteins with this phenotype on the global network using a shortest path (SP) algorithm. The identified proteins are further filtered using both a permutation test and their interactions and sequence similarities to annotated proteins. We compared our method with another widely used method called random walk with restart (RWR). The biological functions of proteins for each phenotype identified by our SP method and the RWR method were analyzed and compared. The results confirmed a large proportion of our novel protein phenotype annotation, and the RWR method showed a higher false positive rate than the SP method. Our method is equally effective for the prediction of proteins involving in all the eleven clustered yeast phenotypes with a quite low false positive rate. Considering the universality and generalizability of our supporting materials and computing strategies, our method can further be applied to study other organisms and the new functions we predicted can provide pertinent instructions for the further experimental verifications.

Keywords: Phenotype; Protein sequence similarity; Protein–protein interaction; Shortest path algorithm.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology*
  • Molecular Sequence Annotation / methods
  • Phenotype
  • Protein Interaction Maps / genetics*
  • Proteins / genetics*
  • Sequence Homology, Amino Acid*

Substances

  • Proteins