Semantic Linear Genetic Programming for Symbolic Regression

IEEE Trans Cybern. 2024 Feb;54(2):1321-1334. doi: 10.1109/TCYB.2022.3181461. Epub 2024 Jan 17.

Abstract

Symbolic regression (SR) is an important problem with many applications, such as automatic programming tasks and data mining. Genetic programming (GP) is a commonly used technique for SR. In the past decade, a branch of GP that utilizes the program behavior to guide the search, called semantic GP (SGP), has achieved great success in solving SR problems. However, existing SGP methods only focus on the tree-based chromosome representation and usually encounter the bloat issue and unsatisfactory generalization ability. To address these issues, we propose a new semantic linear GP (SLGP) algorithm. In SLGP, we design a new chromosome representation to encode the programs and semantic information in a linear fashion. To utilize the semantic information more effectively, we further propose a novel semantic genetic operator, namely, mutate-and-divide propagation, to recursively propagate the semantic error within the linear program. The empirical results show that the proposed method has better training and test errors than the state-of-the-art algorithms in solving SR problems and can achieve a much smaller program size.