CRISPR Visualizer: rapid identification and visualization of CRISPR loci via an automated high-throughput processing pipeline

RNA Biol. 2019 Apr;16(4):577-584. doi: 10.1080/15476286.2018.1493332. Epub 2018 Aug 21.

Abstract

A CRISPR locus, defined by an array of repeat and spacer elements, constitutes a genetic record of the ceaseless battle between bacteria and viruses, showcasing the genomic integration of spacers acquired from invasive DNA. In particular, iterative spacer acquisitions represent unique evolutionary histories and are often useful for high-resolution bacterial genotyping, including comparative analysis of closely related organisms, clonal lineages, and clinical isolates. Current spacer visualization methods are typically tedious and can require manual data manipulation and curation, including spacer extraction at each CRISPR locus from genomes of interest. Here, we constructed a high-throughput extraction pipeline coupled with a local web-based visualization tool which enables CRISPR spacer and repeat extraction, rapid visualization, graphical comparison, and progressive multiple sequence alignment. We present the bioinformatic pipeline and investigate the loci of reference CRISPR-Cas systems and model organisms in 4 well-characterized subtypes. We illustrate how this analysis uncovers the evolutionary tracks and homology shared between various organisms through visual comparison of CRISPR spacers and repeats, driven through progressive alignments. Due to the ability to process unannotated genome files with minimal preparation and curation, this pipeline can be implemented promptly. Overall, this efficient high-throughput solution supports accelerated analysis of genomic data sets and enables and expedites genotyping efforts based on CRISPR loci.

Keywords: CRISPR spacer; CRISPR-Cas; crRNA; repeat detection; software.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics
  • Base Sequence
  • CRISPR-Cas Systems / genetics*
  • Computational Biology
  • Genetic Loci*
  • High-Throughput Nucleotide Sequencing / methods*
  • Time Factors

Grants and funding

The authors acknowledge support from NC State University and the NC Ag Foundation;