Distinguishing gorilla mitochondrial sequences from nuclear integrations and PCR recombinants: guidelines for their diagnosis in complex sequence databases

Mol Phylogenet Evol. 2007 May;43(2):553-66. doi: 10.1016/j.ympev.2006.09.013. Epub 2006 Sep 28.

Abstract

Nuclear integrations of mitochondrial DNA (Numts) are widespread in many taxa and if left undetected can confound phylogeny interpretation and bias estimates of mitochondrial DNA (mtDNA) diversity. This is particularly true in gorillas, where recent studies suggest multiple integrations of the first hypervariable (HV1) domain of the mitochondrial control region. Problems can also arise through the inadvertent incorporation of artifacts produced by in vitro recombination between sequence types during polymerase chain reaction amplification. This issue has attracted little attention yet could potentially exacerbate errors in databases already contaminated by Numts. Using a set of existing diagnostic tools, this study set out to systematically inventory Numts and PCR recombinants in a gorilla HV1 sequence database and address the degree to which existing public databases are contaminated. Phylogenetic analysis revealed three distinct gorilla HV1 Numt groups (I, II, and III) that could be readily differentiated from mtDNA sequences by Numt-specific diagnostic sites and sequence-based motifs. Several instances of genuine recombination were also identified by a suite of detection methods. The location of putative breakpoints was identified by eye and by likelihood analysis. Findings from this study reveal widespread nuclear contamination of gorilla HV1 GenBank databases and underline the importance of recognizing not only Numts but also PCR recombinant artifacts as potential sources of data contamination. Guidelines for the routine identification of Numts and in vitro recombinants are presented and should prove useful in the detection of similar artifacts in other species mtDNA databases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Cell Nucleus / genetics*
  • DNA, Mitochondrial*
  • Databases, Nucleic Acid*
  • Gorilla gorilla / genetics*
  • Phylogeny
  • Polymerase Chain Reaction
  • Recombination, Genetic*
  • Sequence Analysis, DNA

Substances

  • DNA, Mitochondrial