A similarity network approach for the analysis and comparison of protein sequence/structure sets

J Biomed Inform. 2010 Apr;43(2):257-67. doi: 10.1016/j.jbi.2010.01.005. Epub 2010 Jan 25.

Abstract

A set of proteins is a complex system whose elements are interrelated on the concept of sequence- and structure-based similarity. Here, we applied a similarity network-based methodology for the representation and analysis of protein sequences and structures sets using a non-redundant set of 311 proteins and three different information criteria based on sequence-derived features, sequence local alignment and structural alignment. A wide set of measurements, like network degree, clustering coefficient, characteristic path length and vertex centrality were utilized to characterize the networks' topology. Protein similarity networks were found medium or highly interconnected and the existence of both clusters and random edges classified their fully connected versions as Small World Networks (SWNs). The SWN architecture was able to host the continuous similarity transition among proteins and model the protein information flow during evolution. Recently reported ancestral elements, like the alpha/beta class and certain folds, were remarkably found to act as hubs in the networks. Additionally, the moderate information value of sequence-derived features when used for fold and class assignment was shown on a network basis. The methodology described here can be applied for the analysis of other complex systems which consist of interrelated elements and a certain information flow.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Cluster Analysis
  • Databases, Protein
  • Models, Molecular
  • Protein Conformation*
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Proteomics / methods*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins