GTFtools: a software package for analyzing various features of gene models

Bioinformatics. 2022 Oct 14;38(20):4806-4808. doi: 10.1093/bioinformatics/btac561.

Abstract

Motivation: Gene-centric bioinformatics studies frequently involve the calculation or the extraction of various features of genes such as splice sites, promoters, independent introns and untranslated regions (UTRs) through manipulation of gene models. Gene models are often annotated in gene transfer format (GTF) files. The features are essential for subsequent analysis such as intron retention detection, DNA-binding site identification and computing splicing strength of splice sites. Some features such as independent introns and splice sites are not provided in existing resources including the commonly used BioMart database. A package that implements and integrates functions to analyze various features of genes will greatly ease routine analysis for related bioinformatics studies. However, to the best of our knowledge, such a package is not available yet.

Results: We introduce GTFtools, a stand-alone command-line software that provides a set of functions to calculate various gene features, including splice sites, independent introns, transcription start sites (TSS)-flanking regions, UTRs, isoform coordination and length, different types of gene lengths, etc. It takes the ENSEMBL or GENCODE GTF files as input and can be applied to both human and non-human gene models like the lab mouse. We compare the utilities of GTFtools with those of two related tools: Bedtools and BioMart. GTFtools is implemented in Python and not dependent on any third-party software, making it very easy to install and use.

Availability and implementation: GTFtools is freely available at www.genemine.org/gtftools.php as well as pyPI and Bioconda.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • DNA
  • Introns
  • Software*
  • Untranslated Regions

Substances

  • DNA
  • Untranslated Regions