Machine Learning Discovery of Computational Model Efficacy Boundaries

Phys Rev Lett. 2020 Aug 21;125(8):085503. doi: 10.1103/PhysRevLett.125.085503.

Abstract

Computational models are formulated in hierarchies of variable fidelity, often with no quantitative rule for defining the fidelity boundaries. We have constructed a dataset from a wide range of atomistic computational models to reveal the accuracy boundary between higher-fidelity models and a simple, lower-fidelity model. The symbolic decision boundary is discovered by optimizing a support vector machine on the data through iterative feature engineering. This data-driven approach reveals two important results: (i) a symbolic rule emerges that is independent of the algorithm, and (ii) the symbolic rule provides a deeper understanding of the fidelity boundary. Specifically, our dataset is composed of radial distribution functions from seven high-fidelity methods that cover wide ranges in the features (element, density, and temperature); high-fidelity results are compared with a simple pair-potential model to discover the nonlinear combination of the features, and the machine learning approach directly reveals the central role of atomic physics in determining accuracy.