Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment

PLoS Comput Biol. 2023 Nov 17;19(11):e1011621. doi: 10.1371/journal.pcbi.1011621. eCollection 2023 Nov.

Abstract

We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.

MeSH terms

  • Amino Acid Sequence
  • CRISPR-Cas Systems*
  • Learning
  • Machine Learning
  • Proteins* / chemistry
  • Proteins* / genetics

Substances

  • Proteins

Grants and funding

SC and RM were supported by the Agence Nationale de la Recherche grant numbers ANR-17-CE30-0021 RBMPro and ANR-19-CE30-0021 Decrypted. CM is recipient of a PhD funding from AMX program, École polytechnique and benefits from financial support from the Centre de Recherche Interdisciplinary (CRI) through ”École Doctorale Frontiéres de l’Innovation en Recherche et Education – Programme Bettencourt”. DB, WR and FD were supported by European Research Council [677823], European Research Council [101044479], Agence Nationale de la Recherche [ANR-10-LABX-62-IBEID]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.