Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning

Nan Zhao; Jing Ginger Han; Chi-Ren Shyu; Dmitry Korkin

doi:10.1371/journal.pcbi.1003592

Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning

PLoS Comput Biol. 2014 May 1;10(5):e1003592. doi: 10.1371/journal.pcbi.1003592. eCollection 2014 May.

Authors

Nan Zhao¹, Jing Ginger Han¹, Chi-Ren Shyu², Dmitry Korkin³

Affiliations

¹ Informatics Institute, University of Missouri, Columbia, Missouri, United States of America.
² Informatics Institute, University of Missouri, Columbia, Missouri, United States of America; Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America.
³ Informatics Institute, University of Missouri, Columbia, Missouri, United States of America; Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America; Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America.

Abstract

Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Artificial Intelligence*
Breast Neoplasms / genetics*
Diabetes Mellitus / genetics*
Genetic Association Studies
Genetic Predisposition to Disease / genetics*
Humans
Pattern Recognition, Automated / methods
Polymorphism, Single Nucleotide / genetics*
Protein Interaction Mapping / methods*
Proteome / genetics*

Substances

Proteome

Grants and funding

We acknowledge funding from National Science Foundation (DBI-0845196, IOS-1126992 to DK). NZ is supported by National Science Foundation (IOS-1126992). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.