MonteCat: A Basin-Hopping-Inspired Catalyst Descriptor Search Algorithm for Machine Learning Models

J Chem Inf Model. 2024 Mar 11;64(5):1512-1521. doi: 10.1021/acs.jcim.3c01952. Epub 2024 Feb 22.

Abstract

Proposing relevant catalyst descriptors that can relate the information on a catalyst's composition to its actual performance is an ongoing area in catalyst informatics, as it is a necessary step to improve our understanding on the target reactions. Herein, a small descriptor-engineered data set containing 3289 descriptor variables and the performance of 200 catalysts for the oxidative coupling of methane (OCM) is analyzed, and a descriptor search algorithm based on the workflow of the Basin-hopping optimization methodology is proposed to select the descriptors that better fit a predictive model. The algorithm, which can be considered wrapper in nature, consists of the successive generation of random-based modifications to the descriptor subset used in a regression model and adopting them depending on their effect on the model's score. The results are presented after being tested on linear and Support Vector Regression models with average cross-validation r2 scores of 0.8268 and 0.6875, respectively.

MeSH terms

  • Algorithms*
  • Machine Learning*