Building Machine Learning Small Molecule Melting Points and Solubility Models Using CCDC Melting Points Dataset

J Chem Inf Model. 2023 May 22;63(10):2948-2959. doi: 10.1021/acs.jcim.3c00308. Epub 2023 May 1.

Abstract

Predicting solubility of small molecules is a very difficult undertaking due to the lack of reliable and consistent experimental solubility data. It is well known that for a molecule in a crystal lattice to be dissolved, it must, first, dissociate from the lattice and then, second, be solvated. The melting point of a compound is proportional to the lattice energy, and the octanol-water partition coefficient (log P) is a measure of the compound's solvation efficiency. The CCDC's melting point dataset of almost one hundred thousand compounds was utilized to create widely applicable machine learning models of small molecule melting points. Using the general solubility equation, the aqueous thermodynamic solubilities of the same compounds can be predicted. The global model could be easily localized by adding additional melting point measurements for a chemical series of interest.

MeSH terms

  • Machine Learning*
  • Octanols / chemistry
  • Solubility
  • Water* / chemistry

Substances

  • Water
  • Octanols