BAGS: An automated Barcode, Audit & Grade System for DNA barcode reference libraries

Mol Ecol Resour. 2021 Feb;21(2):573-583. doi: 10.1111/1755-0998.13262. Epub 2020 Oct 28.

Abstract

Biodiversity studies greatly benefit from molecular tools, such as DNA metabarcoding, which provides an effective identification tool in biomonitoring and conservation programmes. The accuracy of species-level assignment, and consequent taxonomic coverage, relies on comprehensive DNA barcode reference libraries. The role of these libraries is to support species identification, but accidental errors in the generation of the barcodes may compromise their accuracy. Here, we present an R-based application, Barcode, Audit & Grade System (BAGS) (https://github.com/tadeu95/BAGS), that performs automated auditing and annotation of cytochrome c oxidase subunit I (COI) sequences libraries, for a given taxonomic group of animals, available in the Barcode of Life Data System (BOLD). This is followed by implementing a qualitative ranking system that assigns one of five grades (A to E) to each species in the reference library, according to the attributes of the data and congruency of species names with sequences clustered in barcode index numbers (BINs). Our goal is to allow researchers to obtain the most useful and reliable data, highlighting and segregating records according to their congruency. Different tests were performed to perceive its usefulness and limitations. BAGS fulfils a significant gap in the current landscape of DNA barcoding research tools by quickly screening reference libraries to gauge the congruence status of data and facilitate the triage of ambiguous data for posterior review. Thereby, BAGS has the potential to become a valuable addition in forthcoming DNA metabarcoding studies, in the long term contributing to globally improve the quality and reliability of the public reference libraries.

Keywords: BOLD systems; DNA metabarcoding; R; annotation; quality control; reference libraries.

MeSH terms

  • Animals
  • Biodiversity*
  • DNA
  • DNA Barcoding, Taxonomic*
  • Gene Library*
  • Reproducibility of Results
  • Software*

Substances

  • DNA