Leveraging DFT and Molecular Fragmentation for Chemically Accurate p Ka Prediction Using Machine Learning

J Chem Inf Model. 2024 Feb 12;64(3):712-723. doi: 10.1021/acs.jcim.3c01923. Epub 2024 Feb 1.

Abstract

We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the pKas of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.

MeSH terms

  • Machine Learning
  • Models, Chemical*
  • Quantum Theory*
  • Thermodynamics