Two-Dimensional Energy Histograms as Features for Machine Learning to Predict Adsorption in Diverse Nanoporous Materials

J Chem Theory Comput. 2023 Jul 25;19(14):4568-4583. doi: 10.1021/acs.jctc.2c00798. Epub 2023 Feb 3.

Abstract

A major obstacle for machine learning (ML) in chemical science is the lack of physically informed feature representations that provide both accurate prediction and easy interpretability of the ML model. In this work, we describe adsorption systems using novel two-dimensional energy histogram (2D-EH) features, which are obtained from the probe-adsorbent energies and energy gradients at grid points located throughout the adsorbent. The 2D-EH features encode both energetic and structural information of the material and lead to highly accurate ML models (coefficient of determination R2 ∼ 0.94-0.99) for predicting single-component adsorption capacity in metal-organic frameworks (MOFs). We consider the adsorption of spherical molecules (Kr and Xe), linear alkanes with a wide range of aspect ratios (ethane, propane, n-butane, and n-hexane), and a branched alkane (2,2-dimethylbutane) over a wide range of temperatures and pressures. The interpretable 2D-EH features enable the ML model to learn the basic physics of adsorption in pores from the training data. We show that these MOF-data-trained ML models are transferrable to different families of amorphous nanoporous materials. We also identify several adsorption systems where capillary condensation occurs, and ML predictions are more challenging. Nevertheless, our 2D-EH features still outperform structural features including those derived from persistent homology. The novel 2D-EH features may help accelerate the discovery and design of advanced nanoporous materials using ML for gas storage and separation in the future.