Genome-wide operon prediction in Staphylococcus aureus

Nucleic Acids Res. 2004 Jul 13;32(12):3689-702. doi: 10.1093/nar/gkh694. Print 2004.

Abstract

Identification of operon structure is critical to understanding gene regulation and function, and pathogenesis, and for identifying targets towards the development of new antibiotics in bacteria. Recently, the complete genome sequences of a large number of important human bacterial pathogens have become available for computational analysis, including the major human Gram-positive pathogen Staphylococcus aureus. By annotating the predicted operon structure of the S.aureus genome, we hope to facilitate the exploration of the unique biology of this organism as well as the comparative genomics across a broad range of bacteria. We have integrated several operon prediction methods and developed a consensus approach to score the likelihood of each adjacent gene pair to be co-transcribed. Gene pairs were separated into distinct operons when scores were equal to or below an empirical threshold. Using this approach, we have generated a S.aureus genome map with scores annotated at the intersections of every adjacent gene pair. This approach predicted about 864 monocistronic transcripts and 533 polycistronic operons from the protein-encoding genes in the S.aureus strain Mu50 genome. When compared with a set of experimentally determined S.aureus operons from literature sources, this method successfully predicted at least 91% of gene pairs. At the transcription unit level, this approach correctly identified at least 92% of complete operons in this dataset. This consensus approach has enabled us to predict operons with high accuracy from a genome where limited experimental evidence for operon structure is available.

Publication types

  • Evaluation Study

MeSH terms

  • Chromosome Mapping
  • Computational Biology / methods*
  • Genome, Bacterial*
  • Genomics / methods*
  • Operon*
  • Staphylococcus aureus / genetics*
  • Transcription, Genetic