Analyzing the Accuracy of Critical Micelle Concentration Predictions Using Deep Learning

J Chem Theory Comput. 2023 Oct 24;19(20):7371-7386. doi: 10.1021/acs.jctc.3c00868. Epub 2023 Oct 10.

Abstract

This paper presents a novel approach to predicting critical micelle concentrations (CMCs) by using graph neural networks (GNNs) augmented with Gaussian processes (GPs). The proposed model uses learned latent space representations of molecules to predict CMCs and estimate uncertainties. The performance of the model on a data set containing nonionic, cationic, anionic, and zwitterionic molecules is compared against a linear model that works with extended connectivity fingerprints (ECFPs). The GNN-based model performs slightly better than the linear ECFP model when there is enough well-balanced training data and achieves predictive accuracy that is comparable to published models that were evaluated on a smaller range of surfactant chemistries. We illustrate the applicability domain of our model using a molecular cartogram to visualize the latent space, which helps to identify molecules for which predictions are likely to be erroneous. In addition to accurately predicting CMCs for some surfactant classes, the proposed approach can provide valuable insights into the molecular properties that influence CMCs.