Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning

PLoS Comput Biol. 2014 May 1;10(5):e1003592. doi: 10.1371/journal.pcbi.1003592. eCollection 2014 May.

Abstract

Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Breast Neoplasms / genetics*
  • Diabetes Mellitus / genetics*
  • Genetic Association Studies
  • Genetic Predisposition to Disease / genetics*
  • Humans
  • Pattern Recognition, Automated / methods
  • Polymorphism, Single Nucleotide / genetics*
  • Protein Interaction Mapping / methods*
  • Proteome / genetics*

Substances

  • Proteome

Grants and funding

We acknowledge funding from National Science Foundation (DBI-0845196, IOS-1126992 to DK). NZ is supported by National Science Foundation (IOS-1126992). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.