Comparing graph representations of protein structure for mining family-specific residue-based packing motifs

J Comput Biol. 2005 Jul-Aug;12(6):657-71. doi: 10.1089/cmb.2005.12.657.

Abstract

We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Computational Biology*
  • Computer Graphics*
  • Databases, Protein
  • Models, Molecular
  • Models, Statistical
  • Molecular Structure
  • Proteins / chemistry*
  • Proteins / classification*
  • Structural Homology, Protein*

Substances

  • Proteins