Comparison of Queueing Data-Structures for Kinetic Monte Carlo Simulations of Heterogeneous Catalysts

J Phys Chem A. 2020 Sep 24;124(38):7843-7856. doi: 10.1021/acs.jpca.0c06871. Epub 2020 Sep 14.

Abstract

On-lattice kinetic Monte Carlo (KMC) is a computational method used to simulate (among others) physicochemical processes on catalytic surfaces. The KMC algorithm propagates the system through discrete configurations by selecting (with the use of random numbers) the next elementary process to be simulated, e.g., adsorption, desorption, diffusion, or reaction. An implementation of such a selection procedure is the first-reaction method in which all realizable elementary processes are identified and assigned a random occurrence time based on their rate constant. The next event to be executed will then be the one with the minimum interarrival time. Thus, a fast and efficient algorithm for selecting the most imminent process and performing all of the necessary updates on the list of realizable processes post execution is of great importance. In the current work, we implement five data-structures to handle the elementary process queue during a KMC run: an unsorted list, a binary heap, a pairing heap, a one-way skip list, and finally, a novel two-way skip list with a mapping array specialized for KMC simulations. We also investigate the effect of compiler optimizations on the performance of these data-structures on three benchmark models, capturing CO oxidation, a simplified water gas shift mechanism, and a temperature-programmed desorption run. Excluding the least efficient and impractical for large-problems unsorted list, we observe a 3× speedup of the binary or pairing heaps (most efficient) compared to the one-way skip list (least efficient). Compiler optimizations deliver a speedup of up to 1.8×. These benchmarks provide valuable insight into the importance of, often-overlooked, implementation-related aspects of KMC simulations, such as the queueing data-structures. Our results could be particularly useful in guiding the choice of data-structures and algorithms that would minimize the computational cost of large-scale simulations.