MITGARD: an automated pipeline for mitochondrial genome assembly in eukaryotic species using RNA-seq data

Brief Bioinform. 2021 Sep 2;22(5):bbaa429. doi: 10.1093/bib/bbaa429.

Abstract

Motivation: Over the past decade, the field of next-generation sequencing (NGS) has seen dramatic advances in methods and a decrease in costs. Consequently, a large expansion of data has been generated by NGS, most of which have originated from RNA-sequencing (RNA-seq) experiments. Because mitochondrial genes are expressed in most eukaryotic cells, mitochondrial mRNA sequences are usually co-sequenced within the target transcriptome, generating data that are commonly underused or discarded. Here, we present MITGARD, an automated pipeline that reliably recovers the mitochondrial genome from RNA-seq data from various sources. The pipeline identifies mitochondrial sequence reads based on a phylogenetically related reference, assembles them into contigs, and extracts a complete mtDNA for the target species.

Results: We demonstrate that MITGARD can reconstruct the mitochondrial genomes of several species throughout the tree of life. We noticed that MITGARD can recover the mitogenomes in different sequencing schemes and even in a scenario of low-sequencing depth. Moreover, we showed that the use of references from congeneric species diverging up to 30 million years ago (MYA) from the target species is sufficient to recover the entire mitogenome, whereas the use of species diverging between 30 and 60 MYA allows the recovery of most mitochondrial genes. Additionally, we provide a case study with original data in which we estimate a phylogenetic tree of snakes from the genus Bothrops, further demonstrating that MITGARD is suitable for use on biodiversity projects. MITGARD is then a valuable tool to obtain high-quality information for studies focusing on the phylogenetic and evolutionary aspects of eukaryotes and provides data for easily identifying a sample using barcoding, and to check for cross-contamination using third-party tools.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bothrops / classification
  • Bothrops / genetics*
  • Eukaryotic Cells
  • Genome, Mitochondrial*
  • RNA-Seq*
  • Software*