Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction

Infect Genet Evol. 2012 Jul;12(5):922-32. doi: 10.1016/j.meegid.2011.10.027. Epub 2011 Nov 7.

Abstract

Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / classification
  • Bacterial Proteins / genetics
  • Bacterial Proteins / physiology*
  • Databases, Protein
  • Genes, Bacterial / genetics
  • Genes, Bacterial / physiology
  • Models, Statistical
  • Mycobacterium tuberculosis / genetics
  • Mycobacterium tuberculosis / physiology*
  • Oligonucleotide Array Sequence Analysis
  • Protein Interaction Maps*
  • Proteome / analysis
  • Proteome / genetics
  • Proteomics / methods*
  • ROC Curve

Substances

  • Bacterial Proteins
  • Proteome