Interrogating 1000 insect genomes for NUMTs: A risk assessment for estimates of species richness

PLoS One. 2023 Jun 8;18(6):e0286620. doi: 10.1371/journal.pone.0286620. eCollection 2023.

Abstract

The nuclear genomes of most animal species include NUMTs, segments of the mitogenome incorporated into their chromosomes. Although NUMT counts are known to vary greatly among species, there has been no comprehensive study of their frequency/attributes in the most diverse group of terrestrial organisms, insects. This study examines NUMTs derived from a 658 bp 5' segment of the cytochrome c oxidase I (COI) gene, the barcode region for the animal kingdom. This assessment is important because unrecognized NUMTs can elevate estimates of species richness obtained through DNA barcoding and derived approaches (eDNA, metabarcoding). This investigation detected nearly 10,000 COI NUMTs ≥ 100 bp in the genomes of 1,002 insect species (range = 0-443). Variation in nuclear genome size explained 56% of the mitogenome-wide variation in NUMT counts. Although insect orders with the largest genome sizes possessed the highest NUMT counts, there was considerable variation among their component lineages. Two thirds of COI NUMTs possessed an IPSC (indel and/or premature stop codon) allowing their recognition and exclusion from downstream analyses. The remainder can elevate species richness as they showed 10.1% mean divergence from their mitochondrial homologue. The extent of exposure to "ghost species" is strongly impacted by the target amplicon's length. NUMTs can raise apparent species richness by up to 22% when a 658 bp COI amplicon is examined versus a doubling of apparent richness when 150 bp amplicons are targeted. Given these impacts, metabarcoding and eDNA studies should target the longest possible amplicons while also avoiding use of 12S/16S rDNA as they triple NUMT exposure because IPSC screens cannot be employed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cell Nucleus / genetics
  • DNA, Mitochondrial* / genetics
  • Genome, Insect*
  • Insecta / genetics
  • Mitochondria / genetics
  • Phylogeny
  • Risk Assessment
  • Sequence Analysis, DNA

Substances

  • DNA, Mitochondrial

Associated data

  • figshare/10.6084/m9.figshare.22939934

Grants and funding

The research was enabled by awards in support of BIOSCAN from the Government of Canada through its New Frontiers in Research Fund [NFRFT-2020-00073] and through the Large Scale Applied Research Program administered by Genome Canada and Ontario Genomics (OGI-208). PDNH gratefully acknowledges support from the Canada Research Chairs program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.