A Mixed Quantum Chemistry/Machine Learning Approach for the Fast and Accurate Prediction of Biochemical Redox Potentials and Its Large-Scale Application to 315 000 Redox Reactions

ACS Cent Sci. 2019 Jul 24;5(7):1199-1210. doi: 10.1021/acscentsci.9b00297. Epub 2019 Jun 7.

Abstract

A quantitative understanding of the thermodynamics of biochemical reactions is essential for accurately modeling metabolism. The group contribution method (GCM) is one of the most widely used approaches to estimate standard Gibbs energies and redox potentials of reactions for which no experimental measurements exist. Previous work has shown that quantum chemical predictions of biochemical thermodynamics are a promising approach to overcome the limitations of GCM. However, the quantum chemistry approach is significantly more expensive. Here, we use a combination of quantum chemistry and machine learning to obtain a fast and accurate method for predicting the thermodynamics of biochemical redox reactions. We focus on predicting the redox potentials of carbonyl functional group reductions to alcohols and amines, two of the most ubiquitous carbon redox transformations in biology. Our method relies on semiempirical quantum chemistry calculations calibrated with Gaussian process (GP) regression against available experimental data and results in higher predictive power than the GCM at low computational cost. Direct calibration of GCM and fingerprint-based predictions (without quantum chemistry) with GP regression also results in significant improvements in prediction accuracy, demonstrating the versatility of the approach. We design and implement a network expansion algorithm that iteratively reduces and oxidizes a set of natural seed metabolites and demonstrate the high-throughput applicability of our method by predicting the standard potentials of more than 315 000 redox reactions involving approximately 70 000 compounds. Additionally, we developed a novel fingerprint-based framework for detecting molecular environment motifs that are enriched or depleted across different regions of the redox potential landscape. We provide open access to all source code and data generated.