Sensommatic: an efficient pipeline to mine and predict sensory receptor genes in the era of reference-quality genomes

Bioinformatics. 2024 Jan 2;40(1):btae040. doi: 10.1093/bioinformatics/btae040.

Abstract

Summary: Sensory receptor gene families have undergone extensive expansion and loss across vertebrate evolution, leading to significant variation in receptor counts between species. However, due to their species-specific nature, conventional reference-based annotation tools often underestimate the true number of sensory receptors in a given species. While there has been an exponential increase in the taxonomic diversity of publicly available genome assemblies in recent years, only ∼30% of vertebrate species on the NCBI database are currently annotated. To overcome these limitations, we developed 'Sensommatic', an automated and accessible sensory receptor annotation pipeline. Sensommatic implements BLAST and AUGUSTUS to mine and predict sensory receptor genes from whole genome assemblies, adopting a one-to-many gene mapping approach. While designed for vertebrates, Sensommatic can be extended to run on non-vertebrate species by generating customized reference files, making it a scalable and generalizable tool.

Availability and implementation: Source code and associated files are available at: https://github.com/GMHughes/Sensommatic.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Chromosome Mapping
  • Genome*
  • Molecular Sequence Annotation
  • Software*
  • Vertebrates / genetics