Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT

Comput Biol Med. 2024 Feb:169:107932. doi: 10.1016/j.compbiomed.2024.107932. Epub 2024 Jan 1.

Abstract

Off-target effects of CRISPR/Cas9 can lead to suboptimal genome editing outcomes. Numerous deep learning-based approaches have achieved excellent performance for off-target prediction; however, few can predict the off-target activities with both mismatches and indels between single guide RNA (sgRNA) and target DNA sequence pair. In addition, data imbalance is a common pitfall for off-target prediction. Moreover, due to the complexity of genomic contexts, generating an interpretable model also remains challenged. To address these issues, firstly we developed a BERT-based model called CRISPR-BERT for enhancing the prediction of off-target activities with both mismatches and indels. Secondly, we proposed an adaptive batch-wise class balancing strategy to combat the noise exists in imbalanced off-target data. Finally, we applied a visualization approach for investigating the generalizable nucleotide position-dependent patterns of sgRNA-DNA pair for off-target activity. In our comprehensive comparison to existing methods on five mismatches-only datasets and two mismatches-and-indels datasets, CRISPR-BERT achieved the best performance in terms of AUROC and PRAUC. Besides, the visualization analysis demonstrated how implicit knowledge learned by CRISPR-BERT facilitates off-target prediction, which shows potential in model interpretability. Collectively, CRISPR-BERT provides an accurate and interpretable framework for off-target prediction, further contributes to sgRNA optimization in practical use for improved target specificity in CRISPR/Cas9 genome editing. The source code is available at https://github.com/BrokenStringx/CRISPR-BERT.

Keywords: Adaptive batch-wise class balancing; BERT; CRISPR/Cas9; Deep learning; Off-target.

MeSH terms

  • CRISPR-Cas Systems*
  • Gene Editing
  • Genome
  • Genomics
  • RNA, Guide, CRISPR-Cas Systems*

Substances

  • RNA, Guide, CRISPR-Cas Systems