{"title":"Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming","authors":"Gabriel Kronberger , Fabrício Olivetti de França","doi":"10.1016/j.jsc.2024.102413","DOIUrl":null,"url":null,"abstract":"<div><div>Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) – a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits overparameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form. The effect was smaller for the number of NLS iterations until convergence, where we found fewer or equal iterations for 51% to 63% of the expressions after simplification.</div></div>","PeriodicalId":50031,"journal":{"name":"Journal of Symbolic Computation","volume":"129 ","pages":"Article 102413"},"PeriodicalIF":0.6000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Symbolic Computation","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747717124001172","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) – a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits overparameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form. The effect was smaller for the number of NLS iterations until convergence, where we found fewer or equal iterations for 51% to 63% of the expressions after simplification.
期刊介绍:
An international journal, the Journal of Symbolic Computation, founded by Bruno Buchberger in 1985, is directed to mathematicians and computer scientists who have a particular interest in symbolic computation. The journal provides a forum for research in the algorithmic treatment of all types of symbolic objects: objects in formal languages (terms, formulas, programs); algebraic objects (elements in basic number domains, polynomials, residue classes, etc.); and geometrical objects.
It is the explicit goal of the journal to promote the integration of symbolic computation by establishing one common avenue of communication for researchers working in the different subareas. It is also important that the algorithmic achievements of these areas should be made available to the human problem-solver in integrated software systems for symbolic computation. To help this integration, the journal publishes invited tutorial surveys as well as Applications Letters and System Descriptions.