EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes

Methods Mol Biol. 2019:1962:97-120. doi: 10.1007/978-1-4939-9173-0_6.

Abstract

EuGene is an integrative gene finder applicable to both prokaryotic and eukaryotic genomes. EuGene annotated its first genome in 1999. Starting from genomic DNA sequences representing a complete genome, EuGene is able to predict the major transcript units in the genome from a variety of sources of information: statistical information, similarities with known transcripts and proteins, but also any GFF3 structured information supporting the presence or absence of specific types of elements. EuGene has been used to find genes in the plants Arabidopsis thaliana, Medicago truncatula, and Theobroma cacao; tomato, sunflower, and Rosa genomes; and in the nematode Meloidogyne incognita genome, among many others. The large fraction of plant in this list probably influenced EuGene development, especially in its capacities to withstand a genome with a large number of repeated regions and transposable elements.Depending on the sources of information used for prediction, EuGene can be considered as purely ab initio, purely similarity based, or hybrid. With the general availability of NGS-transcribed sequence data in genome projects, EuGene adopts a default hybrid behavior that strongly relies on similarity information. Initially targeted at eukaryotic genomes, EuGene has also been extended to offer integrative gene prediction for bacteria, allowing for richer and robust predictions than either purely statistical or homology-based prokaryotic gene finders.This text has been written as a practical guide that will give you the capacity to train and execute EuGene on your favorite eukaryotic genome. As the prokaryotic case is simpler and has already been described, only the main differences with the eukaryotic version were reported.

Keywords: EuGene; Integrative gene finder; Non-coding genes; Prokaryotic and eukaryotic genomes; Protein-coding genes.

MeSH terms

  • Arabidopsis / genetics
  • Computational Biology / methods*
  • Databases, Genetic
  • Eukaryotic Cells*
  • Internet
  • Machine Learning
  • Models, Statistical
  • Molecular Sequence Annotation
  • Plants / genetics
  • Prokaryotic Cells*
  • Proteome / genetics
  • RNA Splice Sites
  • RNA, Untranslated
  • Software*
  • Transcriptome
  • Web Browser

Substances

  • Proteome
  • RNA Splice Sites
  • RNA, Untranslated