Characterization of novel proteins based on known protein structures

J Mol Biol. 2000 Mar 3;296(4):1139-52. doi: 10.1006/jmbi.1999.3501.

Abstract

The genome sciences face the challenge to characterize structure and function of a vast number of novel genes. Sequence search techniques are used to infer functional and structural information from similarities to experimentally characterized genes or proteins. The persistent goal is to refine these techniques and to develop alternative and complementary methods to increase the range of reliable inference.Here, we focus on the structural and functional assignments that can be inferred from the known three-dimensional structures of proteins. The study uses all structures in the Protein Data Bank that were known by the end of 1997. The protein structures released in 1998 were then characterized in terms of functional and structural similarity to the previously known structures, yielding an estimate of the maximum amount of information on novel protein sequences that can be obtained from inference techniques. The 147 globular proteins corresponding to 196 domains released in 1998 have no clear sequence similarity to previously known structures. However, 75 % of the domains have extensive structure similarity to previously known folds, and most importantly, in two out of three cases similarity in structure coincides with related function. In view of this analysis, full utilization of existing structure data bases would provide information for many new targets even if the relationship is not accessible from sequence information alone. Currently, the most sophisticated techniques detect of the order of one-third of these relationships.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / chemistry
  • Carrier Proteins / chemistry
  • Desulfovibrio vulgaris
  • Flavoproteins*
  • Models, Chemical
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Conformation*
  • Sequence Homology, Amino Acid

Substances

  • Bacterial Proteins
  • Carrier Proteins
  • FMN-binding protein, Desulfovibrio vulgaris
  • Flavoproteins