Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts

Simone Gallarati; Raimon Fabregat; Rubén Laplaza; Sinjini Bhattacharjee; Matthew D Wodrich; Clemence Corminboeuf

doi:10.1039/d1sc00482d

Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts

Chem Sci. 2021 Apr 3;12(20):6879-6889. doi: 10.1039/d1sc00482d.

Authors

Simone Gallarati¹, Raimon Fabregat¹, Rubén Laplaza^{1

2}, Sinjini Bhattacharjee^{1

3}, Matthew D Wodrich^{1

2}, Clemence Corminboeuf^{1

2

4}

Affiliations

¹ Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland clemence.corminboeuf@epfl.ch.
² National Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland.
³ Indian Institute of Science Education and Research Dr Homi Bhabha Rd, Ward No. 8, NCL Colony, Pashan Pune Maharashtra 411008 India.
⁴ National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland.

Abstract

Hundreds of catalytic methods are developed each year to meet the demand for high-purity chiral compounds. The computational design of enantioselective organocatalysts remains a significant challenge, as catalysts are typically discovered through experimental screening. Recent advances in combining quantum chemical computations and machine learning (ML) hold great potential to propel the next leap forward in asymmetric catalysis. Within the context of quantum chemical machine learning (QML, or atomistic ML), the ML representations used to encode the three-dimensional structure of molecules and evaluate their similarity cannot easily capture the subtle energy differences that govern enantioselectivity. Here, we present a general strategy for improving molecular representations within an atomistic machine learning model to predict the DFT-computed enantiomeric excess of asymmetric propargylation organocatalysts solely from the structure of catalytic cycle intermediates. Mean absolute errors as low as 0.25 kcal mol^-1 were achieved in predictions of the activation energy with respect to DFT computations. By virtue of its design, this strategy is generalisable to other ML models, to experimental data and to any catalytic asymmetric reaction, enabling the rapid screening of structurally diverse organocatalysts from available structural information.