Figmop: a profile HMM to identify genes and bypass troublesome gene models in draft genomes

Bioinformatics. 2014 Nov 15;30(22):3266-7. doi: 10.1093/bioinformatics/btu544. Epub 2014 Aug 12.

Abstract

Motivation: Gene models from draft genome assemblies of metazoan species are often incorrect, missing exons or entire genes, particularly for large gene families. Consequently, labour-intensive manual curation is often necessary. We present Figmop (Finding Genes using Motif Patterns) to help with the manual curation of gene families in draft genome assemblies. The program uses a pattern of short sequence motifs to identify putative genes directly from the genome sequence. Using a large gene family as a test case, Figmop was found to be more sensitive and specific than a BLAST-based approach. The visualization used allows the validation of potential genes to be carried out quickly and easily, saving hours if not days from an analysis.

Availability and implementation: Source code of Figmop is freely available for download at https://github.com/dave-the-scientist, implemented in C and Python and is supported on Linux, Unix and MacOSX.

Contact: curran.dave.m@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cytochrome P-450 Enzyme System / genetics
  • Genes*
  • Genomics / methods*
  • Markov Chains
  • Models, Genetic*
  • Multigene Family
  • Software*

Substances

  • Cytochrome P-450 Enzyme System