High-throughput sequencing of multiple amplicons for barcoding and integrative taxonomy

Sci Rep. 2017 Feb 6:7:41948. doi: 10.1038/srep41948.

Abstract

Until now, the potential of NGS for the construction of barcode libraries or integrative taxonomy has been seldom realised. Here, we amplified (two-step PCR) and simultaneously sequenced (MiSeq) multiple markers from hundreds of fig wasp specimens. We also developed a workflow for quality control of the data. Illumina and Sanger sequences accumulated in the past years were compared. Interestingly, primers and PCR conditions used for the Sanger approach did not require optimisation to construct the MiSeq library. After quality controls, 87% of the species (76% of the specimens) had a valid MiSeq sequence for each marker. Importantly, major clusters did not always correspond to the targeted loci. Nine specimens exhibited two divergent sequences (up to 10%). In 95% of the species, MiSeq and Sanger sequences obtained from the same sampling were similar. For the remaining 5%, species were paraphyletic or the sequences clustered into divergent groups on the Sanger + MiSeq trees (>7%). These problematic cases may represent coding NUMTS or heteroplasms. Our results illustrate that Illumina approaches are not artefact-free and confirm that Sanger databases can contain non-target genes. This highlights the importance of quality controls, working with taxonomists and using multiple markers for DNA-taxonomy or species diversity assessment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA Barcoding, Taxonomic*
  • Ficus / physiology*
  • Gene Expression Profiling
  • High-Throughput Nucleotide Sequencing / methods*
  • Insect Proteins / genetics*
  • Sequence Analysis, DNA / methods*
  • Wasps / classification
  • Wasps / genetics*

Substances

  • Insect Proteins