Structural annotation of Mycobacterium tuberculosis proteome

PLoS One. 2011;6(10):e27044. doi: 10.1371/journal.pone.0027044. Epub 2011 Oct 31.

Abstract

Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / chemistry*
  • Bacterial Proteins / metabolism*
  • Computational Biology*
  • Gene Expression Regulation, Bacterial
  • Genome, Bacterial*
  • Molecular Sequence Data
  • Mycobacterium tuberculosis / metabolism*
  • Protein Conformation
  • Proteome / analysis*
  • Sequence Homology, Amino Acid

Substances

  • Bacterial Proteins
  • Proteome