Species Identification in Malaise Trap Samples by DNA Barcoding Based on NGS Technologies and a Scoring Matrix

PLoS One. 2016 May 18;11(5):e0155497. doi: 10.1371/journal.pone.0155497. eCollection 2016.

Abstract

The German Barcoding initiatives BFB and GBOL have generated a reference library of more than 16,000 metazoan species, which is now ready for applications concerning next generation molecular biodiversity assessments. To streamline the barcoding process, we have developed a meta-barcoding pipeline: We pre-sorted a single malaise trap sample (obtained during one week in August 2014, southern Germany) into 12 arthropod orders and extracted DNA from pooled individuals of each order separately, in order to facilitate DNA extraction and avoid time consuming single specimen selection. Aliquots of each ordinal-level DNA extract were combined to roughly simulate a DNA extract from a non-sorted malaise sample. Each DNA extract was amplified using four primer sets targeting the CO1-5' fragment. The resulting PCR products (150-400bp) were sequenced separately on an Illumina Mi-SEQ platform, resulting in 1.5 million sequences and 5,500 clusters (coverage ≥10; CD-HIT-EST, 98%). Using a total of 120,000 DNA barcodes of identified, Central European Hymenoptera, Coleoptera, Diptera, and Lepidoptera downloaded from BOLD we established a reference sequence database for a local CUSTOM BLAST. This allowed us to identify 529 Barcode Index Numbers (BINs) from our sequence clusters derived from pooled Malaise trap samples. We introduce a scoring matrix based on the sequence match percentages of each amplicon in order to gain plausibility for each detected BIN, leading to 390 high score BINs in the sorted samples; whereas 268 of these high score BINs (69%) could be identified in the combined sample. The results indicate that a time consuming presorting process will yield approximately 30% more high score BINs compared to the non-sorted sample in our case. These promising results indicate that a fast, efficient and reliable analysis of next generation data from malaise trap samples can be achieved using this pipeline.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arthropods / classification
  • Arthropods / genetics
  • Biodiversity
  • Cluster Analysis
  • Cytochromes c / genetics
  • DNA Barcoding, Taxonomic* / methods
  • Databases, Nucleic Acid
  • Germany
  • High-Throughput Nucleotide Sequencing*
  • Insecta / classification*
  • Insecta / genetics*
  • Workflow

Substances

  • Cytochromes c

Grants and funding

The project was supported by grants from the Bavarian State Government (BFB) and the German Federal Ministry of Education and Research (GBOL2:01LI1501B). LGC Genomics GmbH provided support in the form of salaries for authors BF and SA, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section.