Strategies for the identification, the assembly and the classification of integrated biological systems in completely sequenced genomes

Comput Chem. 2002 Jul;26(5):447-57. doi: 10.1016/s0097-8485(02)00007-4.

Abstract

The proteins involved in a single biological process may form a stable supra-molecular assembly or be transiently in interaction. Although, the first annotation steps of a complete genome may allow the identification of the different partners, their assembly in a functional system, referred to as an integrated system, is a domain where methodological effort has to be done. Indeed, the knowledge required to assemble partners of such systems should be explicitly included in annotation software. The availability of a complete genome, and therefore of all the proteins encoded by that genome, motivated the development of automated approaches through the coordinated combination of different bio-informatic methods allowing the identification of the different partners, their assembly and the classification of the reconstructed systems in functional categories. In this data flux, the identification of the sequence partners represents the principal bottleneck. Here, we describe and compare the results obtained with different classes of methods (BLASTP2, PSI-BLAST, MAST and META-MEME) applied to the identification in complete genomes of a given family of integrated systems: the ABC transporters. PSI-BLAST appears to significantly outperform motif-based methods, and the results are discussed according to the nature of the proteins and the structure of the sub-families.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Computational Biology / methods*
  • Genome, Archaeal
  • Genome, Bacterial
  • Genomics / methods*
  • Protein Structure, Tertiary
  • Proteome / chemistry*
  • Proteome / classification
  • Proteome / genetics
  • Proteome / metabolism*
  • Software*

Substances

  • Proteome