Polymer design via SHAP and Bayesian machine learning optimizes pDNA and CRISPR ribonucleoprotein delivery

Chem Sci. 2024 Apr 22;15(19):7219-7228. doi: 10.1039/d3sc06920f. eCollection 2024 May 15.

Abstract

We present the facile synthesis of a clickable polymer library with systematic variations in length, binary composition, pKa, and hydrophobicity (clog P) to optimize intracellular pDNA and CRISPR-Cas9 ribonucleoprotein (RNP) performance. We couple physicochemical characterization and machine learning to interpret quantitative structure-property relationships within the combinatorial design space. For the first time, we reveal unexpected disparate design parameters for nucleic acid carriers; via explainable machine learning on 432 formulations, we discover that lower polymer pKa and higher percentages of benzimidazole ethanethiol enhance pDNA delivery, yet polymer length and captamine cation identity improve RNP delivery. Closed-loop Bayesian optimization of 552 formulation ratios further enhances in vitro performance. The top three polymers yield a higher signal and stable transgene expression over 20 days in vivo, and a 1.7-fold enhancement over controls. Our facile coupling of synthesis, characterization, and machine analysis provides powerful tools to quantitate performance parameters accelerating next-generation vehicles for nucleic acid medicines.