Graph Theoretic Molecular Fragmentation for Multidimensional Potential Energy Surfaces Yield an Adaptive and General Transfer Machine Learning Protocol

J Chem Theory Comput. 2022 Sep 13;18(9):5125-5144. doi: 10.1021/acs.jctc.1c01241. Epub 2022 Aug 22.

Abstract

Over a series of publications we have introduced a graph-theoretic description for molecular fragmentation. Here, a system is divided into a set of nodes, or vertices, that are then connected through edges, faces, and higher-order simplexes to represent a collection of spatially overlapping and locally interacting subsystems. Each such subsystem is treated at two levels of electronic structure theory, and the result is used to construct many-body expansions that are then embedded within an ONIOM-scheme. These expansions converge rapidly with many-body order (or graphical rank) of subsystems and have been previously used for ab initio molecular dynamics (AIMD) calculations and for computing multidimensional potential energy surfaces. Specifically, in all these cases we have shown that CCSD and MP2 level AIMD trajectories and potential surfaces may be obtained at density functional theory cost. The approach has been demonstrated for gas-phase studies, for condensed phase electronic structure, and also for basis set extrapolation-based AIMD. Recently, this approach has also been used to derive new quantum-computing algorithms that enormously reduce the quantum circuit depth in a circuit-based computation of correlated electronic structure. In this publication, we introduce (a) a family of neural networks that act in parallel to represent, efficiently, the post-Hartree-Fock electronic structure energy contributions for all simplexes (fragments), and (b) a new k-means-based tessellation strategy to glean training data for high-dimensional molecular spaces and minimize the extent of training needed to construct this family of neural networks. The approach is particularly useful when coupled cluster accuracy is desired and when fragment sizes grow in order to capture nonlocal interactions accurately. The unique multidimensional k-means tessellation/clustering algorithm used to determine our training data for all fragments is shown to be extremely efficient and reduces the needed training to only 10% of data for all fragments to obtain accurate neural networks for each fragment. These fully connected dense neural networks are then used to extrapolate the potential energy surface for all molecular fragments, and these are then combined as per our graph-theoretic procedure to transfer the learning process to a full system energy for the entire AIMD trajectory at less than one-tenth the cost as compared to a regular fragmentation-based AIMD calculation.

MeSH terms

  • Algorithms
  • Machine Learning
  • Molecular Dynamics Simulation*
  • Neural Networks, Computer
  • Quantum Theory*