Sliding window analyses for optimal selection of mini-barcodes, and application to 454-pyrosequencing for specimen identification from degraded DNA

PLoS One. 2012;7(5):e38215. doi: 10.1371/journal.pone.0038215. Epub 2012 May 29.

Abstract

DNA barcoding remains a challenge when applied to diet analyses, ancient DNA studies, environmental DNA samples and, more generally, in any cases where DNA samples have not been adequately preserved. Because the size of the commonly used barcoding marker (COI) is over 600 base pairs (bp), amplification fails when the DNA molecule is degraded into smaller fragments. However, relevant information for specimen identification may not be evenly distributed along the barcoding region, and a shorter target can be sufficient for identification purposes. This study proposes a new, widely applicable, method to compare the performance of all potential 'mini-barcodes' for a given molecular marker and to objectively select the shortest and most informative one. Our method is based on a sliding window analysis implemented in the new R package SPIDER (Species IDentity and Evolution in R). This method is applicable to any taxon and any molecular marker. Here, it was tested on earthworm DNA that had been degraded through digestion by carnivorous landsnails. A 100 bp region of 16 S rDNA was selected as the shortest informative fragment (mini-barcode) required for accurate specimen identification. Corresponding primers were designed and used to amplify degraded earthworm (prey) DNA from 46 landsnail (predator) faeces using 454-pyrosequencing. This led to the detection of 18 earthworm species in the diet of the snail. We encourage molecular ecologists to use this method to objectively select the most informative region of the gene they aim to amplify from degraded DNA. The method and tools provided here, can be particularly useful (1) when dealing with degraded DNA for which only small fragments can be amplified, (2) for cases where no consensus has yet been reached on the appropriate barcode gene, or (3) to allow direct analysis of short reads derived from massively parallel sequencing without the need for bioinformatic consolidation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Computational Biology
  • DNA / chemistry*
  • DNA / genetics*
  • DNA Barcoding, Taxonomic / methods*
  • DNA, Ribosomal / chemistry
  • DNA, Ribosomal / genetics
  • Digestion
  • Genetic Markers / genetics
  • Mollusca / physiology
  • Oligochaeta / classification
  • Oligochaeta / genetics

Substances

  • DNA, Ribosomal
  • Genetic Markers
  • DNA