Improving Symbolic Regression for Predicting Materials Properties with Iterative Variable Selection

J Chem Theory Comput. 2022 Aug 9;18(8):4945-4951. doi: 10.1021/acs.jctc.2c00281. Epub 2022 Jul 14.

Abstract

Symbolic regression offers a promising avenue for describing the structure-property relationships of materials with explicit mathematical expressions, yet it meets challenges when the key variables are unclear because of the high complexity of the problems. In this work, we propose to solve the difficulty by automatically searching for important variables from a large pool of input features. A new algorithm that integrates symbolic regression with iterative variable selection (VS) was designed for optimization of the model with a large amount of input features. Using the recent method SISSO for symbolic regression and random search for variable selection, we show that the VS-assisted SISSO (VS-SISSO) can effectively manage even hundreds of input features that the SISSO alone was computationally hindered, and it fastly converges to (near) optimal solutions when the model complexity is not high. The efficiency of this approach for improving the accuracy of symbolic regression in materials science was demonstrated in the two showcase applications of learning approximate equations for the band gap of inorganic halide perovskites and the stability of single-atom alloy catalysts.