Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming

IF 1.1 4区数学 Q4 COMPUTER SCIENCE, THEORY & METHODS

Journal of Symbolic Computation Pub Date : 2024-12-02 DOI:10.1016/j.jsc.2024.102413

Gabriel Kronberger , Fabrício Olivetti de França

{"title":"Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming","authors":"Gabriel Kronberger , Fabrício Olivetti de França","doi":"10.1016/j.jsc.2024.102413","DOIUrl":null,"url":null,"abstract":"<div><div>Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) – a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits overparameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form. The effect was smaller for the number of NLS iterations until convergence, where we found fewer or equal iterations for 51% to 63% of the expressions after simplification.</div></div>","PeriodicalId":50031,"journal":{"name":"Journal of Symbolic Computation","volume":"129 ","pages":"Article 102413"},"PeriodicalIF":1.1000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Symbolic Computation","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0747717124001172","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) – a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits overparameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form. The effect was smaller for the number of NLS iterations until convergence, where we found fewer or equal iterations for 51% to 63% of the expressions after simplification.

查看原文本刊更多论文

遗传规划在符号回归参数优化中减少冗余参数的效果

基于梯度的局部优化已被证明可以改善符号回归（SR）的遗传规划（GP）结果-符号方程学习的机器学习方法。相应地，一些最先进的GP实现使用迭代非线性最小二乘（NLS）算法进行参数的局部优化。然而，在SR和GP文献中，大多被忽视的一个问题是SR表达式的过度参数化，从而导致NLS优化问题的不良条件调节。本研究的目的是分析过参数化对NLS结果和收敛速度的影响，并以Operon为例进行GP/SR实现。首先，我们通过六个选定的基准问题证明了数值秩近似可以用来检测过参数化。在第二部分中，我们分析了通过简化表达式来去除冗余参数是否可以提高NLS结果或收敛速度。由于简化过程不够快，无法在GP适应度评估中使用，因此在收集GP访问的所有表达式后，使用更大的Feynman符号回归基准集进行分析。我们观察到，Operon频繁访问过参数化解，但冗余参数的数量平均较小。我们分析了第一代和最后一代GP的pareto最优表达式，发现70% ~ 80%的简化表达式达到最优的成功率优于或等于过参数化形式。在收敛之前，NLS迭代次数的影响较小，我们发现简化后51%到63%的表达式迭代次数更少或相等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Symbolic Computation 工程技术-计算机：理论方法

CiteScore

2.10

自引率

14.30%

发文量

审稿时长

142 days

期刊介绍： An international journal, the Journal of Symbolic Computation, founded by Bruno Buchberger in 1985, is directed to mathematicians and computer scientists who have a particular interest in symbolic computation. The journal provides a forum for research in the algorithmic treatment of all types of symbolic objects: objects in formal languages (terms, formulas, programs); algebraic objects (elements in basic number domains, polynomials, residue classes, etc.); and geometrical objects. It is the explicit goal of the journal to promote the integration of symbolic computation by establishing one common avenue of communication for researchers working in the different subareas. It is also important that the algorithmic achievements of these areas should be made available to the human problem-solver in integrated software systems for symbolic computation. To help this integration, the journal publishes invited tutorial surveys as well as Applications Letters and System Descriptions.