A similarity network approach for the analysis and comparison of protein sequence/structure sets

Ioannis Valavanis; George Spyrou; Konstantina Nikita

doi:10.1016/j.jbi.2010.01.005

A similarity network approach for the analysis and comparison of protein sequence/structure sets

J Biomed Inform. 2010 Apr;43(2):257-67. doi: 10.1016/j.jbi.2010.01.005. Epub 2010 Jan 25.

Authors

Ioannis Valavanis¹, George Spyrou, Konstantina Nikita

Affiliation

¹ School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., Zografos, 15780 Athens, Greece. ivalavan@biosim.ntua.gr

PMID: 20097308
DOI: 10.1016/j.jbi.2010.01.005

Abstract

A set of proteins is a complex system whose elements are interrelated on the concept of sequence- and structure-based similarity. Here, we applied a similarity network-based methodology for the representation and analysis of protein sequences and structures sets using a non-redundant set of 311 proteins and three different information criteria based on sequence-derived features, sequence local alignment and structural alignment. A wide set of measurements, like network degree, clustering coefficient, characteristic path length and vertex centrality were utilized to characterize the networks' topology. Protein similarity networks were found medium or highly interconnected and the existence of both clusters and random edges classified their fully connected versions as Small World Networks (SWNs). The SWN architecture was able to host the continuous similarity transition among proteins and model the protein information flow during evolution. Recently reported ancestral elements, like the alpha/beta class and certain folds, were remarkably found to act as hubs in the networks. Additionally, the moderate information value of sequence-derived features when used for fold and class assignment was shown on a network basis. The methodology described here can be applied for the analysis of other complex systems which consist of interrelated elements and a certain information flow.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Cluster Analysis
Databases, Protein
Models, Molecular
Protein Conformation*
Proteins / chemistry*
Proteins / classification
Proteins / genetics
Proteomics / methods*
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*

Substances

Proteins