A CRISPR-based strategy for targeted sequencing in biodiversity science

Mol Ecol Resour. 2024 Apr;24(3):e13920. doi: 10.1111/1755-0998.13920. Epub 2023 Dec 28.

Abstract

Many applications in molecular ecology require the ability to match specific DNA sequences from single- or mixed-species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target-specific enrichment capabilities of CRISPR-Cas systems may offer advantages in some applications. We identified 54,837 CRISPR-Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single- and mixed-species samples, which yielded mean chloroplast sequence lengths of 2,530-11,367 bp, depending on the experiment. In comparison to mixed-species experiments, single-species experiments yielded more on-target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed-species experiments yielded sufficient data to provide ≥48-fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplast trnL-P6 marker. Prior work developed CRISPR-based enrichment protocols for long-read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short-read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR-based analyses of mixed-species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori.

Keywords: Cas9; Flye; amplicon sequencing; barcoding; contigs; environmental DNA; guide RNA; long-read sequencing; metabarcoding; metagenomics.

MeSH terms

  • Biodiversity*
  • DNA Barcoding, Taxonomic / methods
  • DNA, Plant
  • High-Throughput Nucleotide Sequencing / methods
  • RNA, Guide, CRISPR-Cas Systems*
  • Sequence Analysis, DNA / methods

Substances

  • RNA, Guide, CRISPR-Cas Systems
  • DNA, Plant