Machine Learning Models of Vibrating H2CO: Comparing Reproducing Kernels, FCHL, and PhysNet

J Phys Chem A. 2020 Oct 22;124(42):8853-8865. doi: 10.1021/acs.jpca.0c05979. Epub 2020 Oct 13.

Abstract

Machine learning (ML) has become a promising tool for improving the quality of atomistic simulations. Using formaldehyde as a benchmark system for intramolecular interactions, a comparative assessment of ML models based on state-of-the-art variants of deep neural networks (NNs), reproducing kernel Hilbert space (RKHS+F), and kernel ridge regression (KRR) is presented. Learning curves for energies and atomic forces indicate rapid convergence toward excellent predictions for B3LYP, MP2, and CCSD(T)-F12 reference results for modestly sized (in the hundreds) training sets. Typically, learning curve offsets decay as one goes from NN (PhysNet) to RKHS+F to KRR (FCHL). Conversely, the predictive power for extrapolation of energies toward new geometries increases in the same order with RKHS+F and FCHL performing almost equally. For harmonic vibrational frequencies, the picture is less clear, with PhysNet and FCHL yielding accuracies of ∼1 and ∼0.2 cm-1, respectively, no matter which reference method, while RKHS+F models level off for B3LYP and exhibit continued improvements for MP2 and CCSD(T)-F12. Finite-temperature molecular dynamics (MD) simulations using the PESs from the three ML methods with identical initial conditions yield indistinguishable infrared spectra with good performance compared with experiment except for the high-frequency modes involving hydrogen stretch motion which is a known limitation of MD for vibrational spectroscopy. For sufficiently large training set sizes, all three models can detect insufficient convergence ("noise") of the reference electronic structure calculations in that the learning curves level off. Transfer learning (TL) from B3LYP to CCSD(T)-F12 with PhysNet indicates that additional improvements in data efficiency can be achieved.