Utilizing Data-Driven Optimization to Automate the Parametrization of Kinetic Monte Carlo Models

J Phys Chem A. 2023 Jul 20;127(28):5967-5978. doi: 10.1021/acs.jpca.3c02482. Epub 2023 Jul 8.

Abstract

Kinetic Monte Carlo (kMC) simulations are a popular tool to investigate the dynamic behavior of stochastic systems. However, one major limitation is their relatively high computational costs. In the last three decades, significant effort has been put into developing methodologies to make kMC more efficient, resulting in an enhanced runtime efficiency. Nevertheless, kMC models remain computationally expensive. This is in particular an issue in complex systems with several unknown input parameters where often most of the simulation time is required for finding a suitable parametrization. A potential route for automating the parametrization of kinetic Monte Carlo models arises from coupling kMC with a data-driven approach. In this work, we equip kinetic Monte Carlo simulations with a feedback loop consisting of Gaussian Processes (GPs) and Bayesian optimization (BO) to enable a systematic and data-efficient input parametrization. We utilize the results from fast-converging kMC simulations to construct a database for training a cheap-to-evaluate surrogate model based on Gaussian processes. Combining the surrogate model with a system-specific acquisition function enables us to apply Bayesian optimization for the guided prediction of suitable input parameters. Thus, the amount of trial simulation runs can be considerably reduced facilitating an efficient utilization of arbitrary kMC models. We showcase the effectiveness of our methodology for a physical process of growing industrial relevance: the space-charge layer formation in solid-state electrolytes as it occurs in all-solid-state batteries. Our data-driven approach requires only 1-2 iterations to reconstruct the input parameters from different baseline simulations within the training data set. Moreover, we show that the methodology is even capable of accurately extrapolating into regions outside the training data set which are computationally expensive for direct kMC simulation. Concluding, we demonstrate the high accuracy of the underlying surrogate model via a full parameter space investigation eventually making the original kMC simulation obsolete.