Whole-Genome Annotation with BRAKER

Methods Mol Biol. 2019:1962:65-95. doi: 10.1007/978-1-4939-9173-0_5.

Abstract

BRAKER is a pipeline for highly accurate and fully automated gene prediction in novel eukaryotic genomes. It combines two major tools: GeneMark-ES/ET and AUGUSTUS. GeneMark-ES/ET learns its parameters from a novel genomic sequence in a fully automated fashion; if available, it uses extrinsic evidence for model refinement. From the protein-coding genes predicted by GeneMark-ES/ET, we select a set for training AUGUSTUS, one of the most accurate gene finding tools that, in contrast to GeneMark-ES/ET, integrates extrinsic evidence already into the gene prediction step. The first published version, BRAKER1, integrated genomic footprints of unassembled RNA-Seq reads into the training as well as into the prediction steps. The pipeline has since been extended to the integration of data on mapped cross-species proteins, and to the usage of heterogeneous extrinsic evidence, both RNA-Seq and protein alignments. In this book chapter, we briefly summarize the pipeline methodology and describe how to apply BRAKER in environments characterized by various combinations of external evidence.

Keywords: AUGUSTUS; BRAKER; Gene prediction; GeneMark-ES/ET; Genome annotation pipeline; Protein mapping to genome; Protein-coding genes; RNA-Seq reads.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Genome*
  • Genomics / methods
  • Internet
  • Molecular Sequence Annotation / methods*
  • Software*
  • User-Computer Interface