A novel bacterial gene-finding system with improved accuracy in locating start codons

DNA Res. 2001 Jun 30;8(3):97-106. doi: 10.1093/dnares/8.3.97.

Abstract

Although a number of bacterial gene-finding programs have been developed, there is still room for improvement especially in the area of correctly detecting translation start sites. We developed a novel bacterial gene-finding program named GeneHacker Plus. Like many others, it is based on a hidden Markov model (HMM) with duration. However, it is a 'local' model in the sense that the model starts from the translation control region and ends at the stop codon of a coding region. Multiple coding regions are identified as partial paths, like local alignments in the Smith-Waterman algorithm, regardless of how they overlap. Moreover, our semiautomatic procedure for constructing the model of the translation control region allows the inclusion of an additional conserved element as well as the ribosome-binding site. We confirmed that GeneHacker Plus is one of the most accurate programs in terms of both finding potential coding regions and precisely locating translation start sites. GeneHacker Plus is also equipped with an option where the results from database homology searches are directly embedded in the HMM. Although this option does not raise the overall predictability, labeled similarity information can be of practical use. GeneHacker Plus can be accessed freely at http://elmo.ims.u-tokyo.ac.jp/GH/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Archaeoglobus fulgidus / genetics
  • Bacillus subtilis / genetics
  • Base Sequence
  • Binding Sites
  • Codon, Initiator / analysis*
  • Codon, Initiator / genetics*
  • Computational Biology / methods*
  • Cyanobacteria / genetics
  • Databases, Nucleic Acid
  • Genes, Bacterial / genetics*
  • Genome, Bacterial
  • Helicobacter pylori / genetics
  • Internet
  • Markov Chains
  • Methanobacterium / genetics
  • Models, Genetic
  • Peptide Chain Initiation, Translational / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics
  • Reproducibility of Results
  • Ribosomes / metabolism
  • Sensitivity and Specificity

Substances

  • Codon, Initiator