Predicting disease-associated substitution of a single amino acid by analyzing residue interactions

BMC Bioinformatics. 2011 Jan 12:12:14. doi: 10.1186/1471-2105-12-14.

Abstract

Background: The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues.

Results: We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively.

Conclusions: The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Substitution*
  • Computational Biology / methods*
  • DNA Mutational Analysis
  • Databases, Protein
  • Genetic Association Studies / methods*
  • Humans
  • Models, Statistical
  • Mutation
  • Polymorphism, Single Nucleotide*
  • Proteins / analysis
  • Sequence Analysis, Protein

Substances

  • Proteins