Metadynamics for training neural network model chemistries: A competitive assessment

John E Herr; Kun Yao; Ryker McIntyre; David W Toth; John Parkhill

doi:10.1063/1.5020067

Metadynamics for training neural network model chemistries: A competitive assessment

J Chem Phys. 2018 Jun 28;148(24):241710. doi: 10.1063/1.5020067.

Authors

John E Herr¹, Kun Yao¹, Ryker McIntyre¹, David W Toth¹, John Parkhill¹

Affiliation

¹ Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA.

PMID: 29960377
DOI: 10.1063/1.5020067

Abstract

Neural network model chemistries (NNMCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and "test data" chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow, "test error" can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript, we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling, and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show that MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near k_bT. It is a cheap tool to address the issue of generalization.