Progress: simultaneous searching of protein databases by sequence and structure

Pac Symp Biocomput. 2004:264-75. doi: 10.1142/9789812704856_0026.

Abstract

We consider the problem of similarity searches on protein databases based on both sequence and structure information simultaneously. Our program extracts feature vectors from both the sequence and structure components of the proteins. These feature vectors are then combined and indexed using a novel multi-dimensional index structure. For a given query, we employ this index structure to find candidate matches from the database. We develop a new method for computing the statistical significance of these candidates. The candidates with high significance are then aligned to the query protein using the Smith-Waterman technique to find the optimal alignment. The experimental results show that our method can classify up to 97% of the superfamilies and up to 100% of the classes correctly according to the SCOP classification. Our method is up to 37 times faster than CTSS, a recent structure search technique, combined with Smith-Waterman technique for sequences.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Abstracting and Indexing
  • Amino Acid Sequence
  • Computational Biology*
  • Databases, Protein*
  • Molecular Structure
  • Proteins / chemistry*
  • Proteins / genetics*
  • Software*

Substances

  • Proteins