Identification of subfamily-specific sites based on active sites modeling and clustering

Bioinformatics. 2010 Dec 15;26(24):3075-82. doi: 10.1093/bioinformatics/btq595. Epub 2010 Oct 26.

Abstract

Motivation: Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions.

Results: Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases.

Availability: http://www.genoscope.fr/ASMC/.

Publication types

  • Validation Study

MeSH terms

  • Catalytic Domain
  • Cluster Analysis
  • Computational Biology / methods
  • Enzymes / classification
  • Models, Biological
  • Molecular Sequence Annotation
  • Phosphorus-Oxygen Lyases / chemistry
  • Protein Kinases / chemistry
  • Proteins / chemistry
  • Proteins / classification*
  • Proteins / metabolism
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Serine Proteases / chemistry

Substances

  • Enzymes
  • Proteins
  • Protein Kinases
  • Serine Proteases
  • Phosphorus-Oxygen Lyases