Using Machine Learning To Predict Partition Coefficient (Log P) and Distribution Coefficient (Log D) with Molecular Descriptors and Liquid Chromatography Retention Time

J Chem Inf Model. 2023 Apr 10;63(7):1906-1913. doi: 10.1021/acs.jcim.2c01373. Epub 2023 Mar 16.

Abstract

During preclinical evaluations of drug candidates, several physicochemical (p-chem) properties are measured and employed as metrics to estimate drug efficacy in vivo. Two such p-chem properties are the octanol-water partition coefficient, Log P, and distribution coefficient, Log D, which are useful in estimating the distribution of drugs within the body. Log P and Log D are traditionally measured using the shake-flask method and high-performance liquid chromatography. However, it is challenging to measure these properties for species that are very hydrophobic (or hydrophilic) owing to the very low equilibrium concentrations partitioned into octanol (or aqueous) phases. Moreover, the shake-flask method is relatively time-consuming and can require multistep dilutions as the range of analyte concentrations can differ by several orders of magnitude. Here, we circumvent these limitations by using machine learning (ML) to correlate Log P and Log D with liquid chromatography (LC) retention time (RT). Predictive models based on four ML algorithms, which used molecular descriptors and LC RTs as features, were extensively tested and compared. The inclusion of RT as an additional descriptor improves model performance (MAE = 0.366 and R2 = 0.89), and Shapley additive explanations analysis indicates that RT has the highest impact on model accuracy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatography, High Pressure Liquid / methods
  • Chromatography, Liquid
  • Machine Learning*
  • Octanols / chemistry
  • Water* / chemistry

Substances

  • Water
  • Octanols

Associated data

  • figshare/10.6084/m9.figshare.8038913.v1