Viktor Martinek, Julia Reuter, Ophelia Frotscher, Sanaz Mostaghim, Markus Richter, Roland Herzog
{"title":"Shape Constraints in Symbolic Regression using Penalized Least Squares","authors":"Viktor Martinek, Julia Reuter, Ophelia Frotscher, Sanaz Mostaghim, Markus Richter, Roland Herzog","doi":"arxiv-2405.20800","DOIUrl":null,"url":null,"abstract":"We study the addition of shape constraints and their consideration during the\nparameter estimation step of symbolic regression (SR). Shape constraints serve\nas a means to introduce prior knowledge about the shape of the otherwise\nunknown model function into SR. Unlike previous works that have explored shape\nconstraints in SR, we propose minimizing shape constraint violations during\nparameter estimation using gradient-based numerical optimization. We test three algorithm variants to evaluate their performance in identifying\nthree symbolic expressions from a synthetically generated data set. This paper\nexamines two benchmark scenarios: one with varying noise levels and another\nwith reduced amounts of training data. The results indicate that incorporating\nshape constraints into the expression search is particularly beneficial when\ndata is scarce. Compared to using shape constraints only in the selection\nprocess, our approach of minimizing violations during parameter estimation\nshows a statistically significant benefit in some of our test cases, without\nbeing significantly worse in any instance.","PeriodicalId":501033,"journal":{"name":"arXiv - CS - Symbolic Computation","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Symbolic Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.20800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We study the addition of shape constraints and their consideration during the
parameter estimation step of symbolic regression (SR). Shape constraints serve
as a means to introduce prior knowledge about the shape of the otherwise
unknown model function into SR. Unlike previous works that have explored shape
constraints in SR, we propose minimizing shape constraint violations during
parameter estimation using gradient-based numerical optimization. We test three algorithm variants to evaluate their performance in identifying
three symbolic expressions from a synthetically generated data set. This paper
examines two benchmark scenarios: one with varying noise levels and another
with reduced amounts of training data. The results indicate that incorporating
shape constraints into the expression search is particularly beneficial when
data is scarce. Compared to using shape constraints only in the selection
process, our approach of minimizing violations during parameter estimation
shows a statistically significant benefit in some of our test cases, without
being significantly worse in any instance.
我们研究了在符号回归(SR)的参数估计步骤中增加形状约束及其考虑因素。形状约束是一种在 SR 中引入关于未知模型函数形状的先验知识的方法。与之前在 SR 中探讨形状约束的工作不同,我们建议在参数估计过程中使用基于梯度的数值优化来最小化违反形状约束的情况。我们测试了三种算法变体,以评估它们在从合成生成的数据集中识别三种符号表达式时的性能。本论文对两种基准情景进行了测试:一种是噪声水平不同的情景,另一种是训练数据量减少的情景。结果表明,在数据稀缺的情况下,将形状约束纳入表达式搜索尤其有益。与仅在选择过程中使用形状约束相比,我们在参数估计过程中最小化违规的方法在一些测试案例中显示出了统计学上的显著优势,而在任何情况下都没有明显的劣势。