Transferable MP2-Based Machine Learning for Accurate Coupled-Cluster Energies

J Chem Theory Comput. 2020 Dec 8;16(12):7453-7461. doi: 10.1021/acs.jctc.0c00927. Epub 2020 Nov 2.

Abstract

Machine learning methods have enabled the low-cost evaluation of molecular properties such as energy at an unprecedented scale. While many of such applications have focused on molecular input based on geometry, few studies consider representations based on the underlying electronic structure. Directing the attention to the electronic structure offers a unique challenge that allows for a more detailed representation of the underlying physics and how they affect molecular properties. The target of this work is to efficiently encode a lower-cost correlated wave function derived from MP2 to predict a higher-cost coupled-cluster singles-and-doubles (CCSD) wave function based on correlation-pair energies and the contributing electron promotions (excitations) and integrals. The new molecular representation explores the short-range behavior of electron correlation and utilizes distinct models that differentiate between two-electron promotions from the same molecular orbital or from two different orbitals. We present a re-engineered set of input features that provide an intuitive description of the orbital properties involved in electron correlation. The overall models are found to be highly transferable and size extensive, necessitating very few training instances to approach the chemical accuracy of a broad spectrum of organic molecules. The efficiency and transferability of the novel representation are demonstrated on a series of linear hydrocarbons, the potential energy surface of the water dimer, and on the GDB-9 database. For the GDB-9 database, we found that data from only 140 randomly selected molecules are adequate to achieve chemical accuracy for more than 133 000 organic molecules.