Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review

Brief Bioinform. 2023 May 19;24(3):bbad131. doi: 10.1093/bib/bbad131.

Abstract

CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.

Keywords: CRISPR-Cas9; Deep Learning; Genome Editing; Machine Learning; Off-Targets; On-Targets.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • CRISPR-Cas Systems*
  • Deep Learning*
  • Gene Editing / methods
  • Machine Learning