Generating Minimal Training Sets for Machine Learned Potentials

Jan Finkbeiner; Samuel Tovey; Christian Holm

doi:10.1103/PhysRevLett.132.167301

Generating Minimal Training Sets for Machine Learned Potentials

Phys Rev Lett. 2024 Apr 19;132(16):167301. doi: 10.1103/PhysRevLett.132.167301.

Authors

Jan Finkbeiner¹, Samuel Tovey², Christian Holm²

Affiliations

¹ Peter Grünberg Institute Forschungszentrum Jülich GmbH Wilhelm-Johnen-Straße, 52428 Jülich, Germany.
² Institute for Computational Physics University of Stuttgart Allmandring 3, 70569 Stuttgart, Germany.

PMID: 38701485
DOI: 10.1103/PhysRevLett.132.167301

Abstract

This Letter presents a novel approach for identifying uncorrelated atomic configurations from extensive datasets with a nonstandard neural network workflow known as random network distillation (RND) for training machine-learned interatomic potentials (MLPs). This method is coupled with a DFT workflow wherein initial data are generated with cheaper classical methods before only the minimal subset is passed to a more computationally expensive ab initio calculation. This benefits training not only by reducing the number of expensive DFT calculations required but also by providing a pathway to the use of more accurate quantum mechanical calculations. The method's efficacy is demonstrated by constructing machine-learned interatomic potentials for the molten salts KCl and NaCl. Our RND method allows accurate models to be fit on minimal datasets, as small as 32 configurations, reducing the required structures by at least 1 order of magnitude compared to alternative methods. This reduction in dataset sizes not only substantially reduces computational overhead for training data generation but also provides a more comprehensive starting point for active-learning procedures.