Gene calling and bacterial genome annotation with BG7

Methods Mol Biol. 2015:1231:177-89. doi: 10.1007/978-1-4939-1720-4_12.

Abstract

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics
  • Base Sequence
  • Contig Mapping / methods*
  • Electronic Data Processing
  • Genome, Bacterial*
  • High-Throughput Nucleotide Sequencing*
  • Metagenomics
  • Molecular Sequence Annotation / methods*
  • Molecular Sequence Annotation / statistics & numerical data
  • Molecular Sequence Data
  • Sequence Alignment
  • Sequence Analysis, DNA / instrumentation*
  • Sequence Analysis, DNA / methods
  • Software*