Publicly Available and Validated DNA Reference Sequences Are Critical to Fungal Identification and Global Plant Protection Efforts: A Use-Case in Colletotrichum

Plant Dis. 2022 Jun;106(6):1573-1596. doi: 10.1094/PDIS-09-21-2083-SR. Epub 2022 May 10.

Abstract

Publicly available and validated DNA reference sequences useful for phylogeny estimation and identification of fungal pathogens are an increasingly important resource in the efforts of plant protection organizations to facilitate safe international trade of agricultural commodities. Colletotrichum species are among the most frequently encountered and regulated plant pathogens at U.S. ports-of-entry. The RefSeq Targeted Loci (RTL) project at NCBI (BioProject no. PRJNA177353) contains a database of curated fungal internal transcribed spacer (ITS) sequences that interact extensively with NCBI Taxonomy, resulting in verified name-strain-sequence type associations for >12,000 species. We present a publicly available dataset of verified and curated name-type strain-sequence associations for all available Colletotrichum species. This includes an updated GenBank Taxonomy for 238 species associated with up to 11 protein coding loci and an updated RTL ITS dataset for 226 species. We demonstrate that several marker loci are well suited for phylogenetic inference and identification. We improve understanding of phylogenetic relationships among verified species, verify or improve phylogenetic circumscriptions of 14 species complexes, and reveal that determining relationships among these major clades will require additional data. We present detailed comparisons between phylogenetic and similarity-based approaches to species identification, revealing complex patterns among single marker loci that often lead to misidentification when based on single-locus similarity approaches. We also demonstrate that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in our dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.

Keywords: Colletotrichum; DNA barcoding; DNA reference sequence; GenBank; RefSeq; fungi; plant protection; plant quarantine; systematics.

MeSH terms

  • Colletotrichum* / genetics
  • Commerce
  • DNA
  • Internationality
  • Phylogeny

Substances

  • DNA