Development and test of highly accurate endpoint free energy methods. 3: partition coefficient prediction using a Poisson-Boltzmann method combined with a solvent accessible surface area model for SAMPL challenges

Phys Chem Chem Phys. 2023 Dec 21;26(1):85-94. doi: 10.1039/d3cp04174c.

Abstract

Accurately predicting solvation free energy is the key to predict protein-ligand binding free energy. In addition, the partition coefficient (log P), which is an important physicochemical property that determines the distribution of a drug in vivo, can be derived directly from transfer free energies, i.e., the difference between solvation free energies (SFEs) in different solvents. Within the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) 9 challenge, we applied the Poisson-Boltzmann (PB) surface area (SA) approach to predict the toluene/water transfer free energy and partition coefficient (log Ptoluene/water) from SFEs. For each solute, only a single conformation automatically generated by the free software Open Babel was used. The PB calculation directly adopts our previously optimized boundary definition - a set of general AMBER force field 2 (GAFF2) atom-type based sphere radii for solute atoms. For the non-polar SA model, we newly developed the solvent-related molecular surface tension parameters γ and offset b for toluene and cyclohexane targeting experimental SFEs. This approach yielded the highest predictive accuracy in terms of root mean square error (RMSE) of 1.52 kcal mol-1 in transfer free energy for 16 small drug molecules among all 18 submissions in the SAMPL9 blind prediction challenge. The re-evaluation of the challenge set using multi-conformation strategies based on molecular dynamics (MD) simulations further reduces the prediction RMSE to 1.33 kcal mol-1. At the same time, an additional evaluation of our PBSA method on the SAMPL5 cyclohexane/water distribution coefficient (log Dcyclohexane/water) prediction revealed that our model outperformed COSMO-RS, the best submission model with RMSEPBSA = 1.88 versus RMSECOSMO-RS = 2.11 log units. Two external log Ptoluene/water and log Pcyclohexane/water datasets that contain 110 and 87 data points, respectively, are collected for extra validation and provide an in-depth insight into the error source of the PBSA method.