{"title":"Genetic Programming-based Feature Selection for Symbolic Regression on Incomplete Data.","authors":"Baligh Al-Helali, Qi Chen, Bing Xue, Mengjie Zhang","doi":"10.1162/evco_a_00362","DOIUrl":"https://doi.org/10.1162/evco_a_00362","url":null,"abstract":"<p><p>High-dimensionality is one of the serious real-world data challenges in symbolic regression and it is more challenging if the data are incomplete. Genetic programming has been successfully utilised for high-dimensional tasks due to its natural feature selection ability, but it is not directly applicable to incomplete data. Commonly, it needs to impute the missing values first and then perform genetic programming on the imputed complete data. However, in the case of having many irrelevant features being incomplete, intuitively, it is not necessary to perform costly imputations on such features. For this purpose, this work proposes a genetic programming-based approach to select features directly from incomplete high-dimensional data to improve symbolic regression performance. We extend the concept of identity/neutral elements from mathematics into the function operators of genetic programming, thus they can handle the missing values in incomplete data. Experiments have been conducted on a number of data sets considering different missingness ratios in high-dimensional symbolic regression tasks. The results show that the proposed method leads to better symbolic regression results when compared with state-of-the-art methods that can select features directly from incomplete data. Further results show that our approach not only leads to better symbolic regression accuracy but also selects a smaller number of relevant features, and consequently improves both the effectiveness and the efficiency of the learning process.</p>","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":" ","pages":"1-27"},"PeriodicalIF":4.6,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Monotone Chance-Constrained Submodular Functions Using Evolutionary Multi-Objective Algorithms.","authors":"Aneta Neumann, Frank Neumann","doi":"10.1162/evco_a_00360","DOIUrl":"https://doi.org/10.1162/evco_a_00360","url":null,"abstract":"<p><p>Many real-world optimization problems can be stated in terms of submodular functions. Furthermore, these real-world problems often involve uncertainties which may lead to the violation of given constraints. A lot of evolutionary multi-objective algorithms following the Pareto optimization approach have recently been analyzed and applied to submodular problems with different types of constraints. We present a first runtime analysis of evolutionary multi-objective algorithms based on Pareto optimization for chance-constrained submodular functions. Here the constraint involves stochastic components and the constraint can only be violated with a small probability of α. We investigate the classical GSEMO algorithm for two different bi-objective formulations using tail bounds to determine the feasibility of solutions. We show that the algorithm GSEMO obtains the same worst case performance guarantees for monotone submodular functions as recently analyzed greedy algorithms for the case of uniform IID weights and uniformly distributed weights with the same dispersion when using the appropriate bi-objective formulation. As part of our investigations, we also point out situations where the use of tail bounds in the first bi-objective formulation can prevent GSEMO from obtaining good solutions in the case of uniformly distributed weights with the same dispersion if the objective function is submodular but non-monotone due to a single element impacting monotonicity. Furthermore, we investigate the behavior of the evolutionary multi-objective algorithms GSEMO, NSGA-II and SPEA2 on different submodular chance-constrained network problems. Our experimental results show that the use of evolutionary multi-objective algorithms leads to significant performance improvements compared to state-of-the-art greedy algorithms for submodular optimization.</p>","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":" ","pages":"1-35"},"PeriodicalIF":4.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genetic Programming for Automatically Evolving Multiple Features to Classification.","authors":"Peng Wang, Bing Xue, Jing Liang, Mengjie Zhang","doi":"10.1162/evco_a_00359","DOIUrl":"https://doi.org/10.1162/evco_a_00359","url":null,"abstract":"<p><p>Performing classification on high-dimensional data poses a significant challenge due to the huge search space. Moreover, complex feature interactions introduce an additional obstacle. The problems can be addressed by using feature selection to select relevant features or feature construction to construct a small set of high-level features. However, performing feature selection or feature construction only might make the feature set suboptimal. To remedy this problem, this study investigates the use of genetic programming for simultaneous feature selection and feature construction in addressing different classification tasks. The proposed approach is tested on 16 datasets and compared with seven methods including both feature selection and feature constructions techniques. The results show that the obtained feature sets with the constructed and/or selected features can significantly increase the classification accuracy and reduce the dimensionality of the datasets. Further analysis reveals the complementarity of the obtained features leading to the promising classification performance of the proposed method.</p>","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":" ","pages":"1-27"},"PeriodicalIF":4.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Paolo;Miranda Coninx;Alban Laflaquière;Stephane Doncieux
{"title":"Discovering and Exploiting Sparse Rewards in a Learned Behavior Space","authors":"Giuseppe Paolo;Miranda Coninx;Alban Laflaquière;Stephane Doncieux","doi":"10.1162/evco_a_00343","DOIUrl":"10.1162/evco_a_00343","url":null,"abstract":"Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. In these situations, a good strategy is to focus on exploration, hopefully leading to the discovery of a reward signal to improve on. A learning algorithm capable of dealing with this kind of setting has to be able to (1) explore possible agent behaviors and (2) exploit any possible discovered reward. Exploration algorithms have been proposed that require the definition of a low-dimension behavior space, in which the behavior generated by the agent's policy can be represented. The need to design a priori this space such that it is worth exploring is a major limitation of these algorithms. In this work, we introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while optimizing any reward discovered (see Figure 1). It does so by separating the exploration and learning of the behavior space from the exploitation of the reward through an alternating two-step process. In the first step, STAX builds a repertoire of diverse policies while learning a low-dimensional representation of the high-dimensional observations generated during the policies evaluation. In the exploitation step, emitters optimize the performance of the discovered rewarding solutions. Experiments conducted on three different sparse reward environments show that STAX performs comparably to existing baselines while requiring much less prior information about the task as it autonomously builds the behavior space it explores.","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":"32 3","pages":"275-305"},"PeriodicalIF":4.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41171496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preliminary Analysis of Simple Novelty Search","authors":"R. Paul Wiegand","doi":"10.1162/evco_a_00340","DOIUrl":"10.1162/evco_a_00340","url":null,"abstract":"Novelty search is a powerful tool for finding diverse sets of objects in complicated spaces. Recent experiments on simplified versions of novelty search introduce the idea that novelty search happens at the level of the archive space, rather than individual points. The sparseness measure and archive update criterion create a process that is driven by a two measures: (1) spread out to cover the space while trying to remain as efficiently packed as possible, and (2) metrics inspired by k nearest neighbor theory. In this paper, we generalize previous simplifications of novelty search to include traditional population (μ,λ) dynamics for generating new search points, where the population and the archive are updated separately. We provide some theoretical guidance regarding balancing mutation and sparseness criteria and introduce the concept of saturation as a way of talking about fully covered spaces. We show empirically that claims that novelty search is inherently objectiveless are incorrect. We leverage the understanding of novelty search as an optimizer of archive coverage, suggest several ways to improve the search, and demonstrate one simple improvement—generating some new points directly from the archive rather than the parent population.","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":"32 3","pages":"249-273"},"PeriodicalIF":4.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9828886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Tri-Objective Method for Bi-Objective Feature Selection in Classification","authors":"Ruwang Jiao;Bing Xue;Mengjie Zhang","doi":"10.1162/evco_a_00339","DOIUrl":"10.1162/evco_a_00339","url":null,"abstract":"Minimizing the number of selected features and maximizing the classification performance are two main objectives in feature selection, which can be formulated as a bi-objective optimization problem. Due to the complex interactions between features, a solution (i.e., feature subset) with poor objective values does not mean that all the features it selects are useless, as some of them combined with other complementary features can greatly improve the classification performance. Thus, it is necessary to consider not only the performance of feature subsets in the objective space, but also their differences in the search space, to explore more promising feature combinations. To this end, this paper proposes a tri-objective method for bi-objective feature selection in classification, which solves a bi-objective feature selection problem as a tri-objective problem by considering the diversity (differences) between feature subsets in the search space as the third objective. The selection based on the converted tri-objective method can maintain a balance between minimizing the number of selected features, maximizing the classification performance, and exploring more promising feature subsets. Furthermore, a novel initialization strategy and an offspring reproduction operator are proposed to promote the diversity of feature subsets in the objective space and improve the search ability, respectively. The proposed algorithm is compared with five multiobjective-based feature selection methods, six typical feature selection methods, and two peer methods with diversity as a helper objective. Experimental results on 20 real-world classification datasets suggest that the proposed method outperforms the compared methods in most scenarios.","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":"32 3","pages":"217-248"},"PeriodicalIF":4.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9822009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob de Nobel;Furong Ye;Diederick Vermetten;Hao Wang;Carola Doerr;Thomas Bäck
{"title":"IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics","authors":"Jacob de Nobel;Furong Ye;Diederick Vermetten;Hao Wang;Carola Doerr;Thomas Bäck","doi":"10.1162/evco_a_00342","DOIUrl":"10.1162/evco_a_00342","url":null,"abstract":"We present IOHexperimenter, the experimentation module of the IOHprofiler project. IOHexperimenter aims at providing an easy-to-use and customizable toolbox for benchmarking iterative optimization heuristics such as local search, evolutionary and genetic algorithms, and Bayesian optimization techniques. IOHexperimenter can be used as a stand-alone tool or as part of a benchmarking pipeline that uses other modules of the IOHprofiler environment. IOHexperimenter provides an efficient interface between optimization problems and their solvers while allowing for granular logging of the optimization process. Its logs are fully compatible with existing tools for interactive data analysis, which significantly speeds up the deployment of a benchmarking pipeline. The main components of IOHexperimenter are the environment to build customized problem suites and the various logging options that allow users to steer the granularity of the data records.","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":"32 3","pages":"205-210"},"PeriodicalIF":4.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9862561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pflacco: Feature-Based Landscape Analysis of Continuous and Constrained Optimization Problems in Python","authors":"Raphael Patrick Prager;Heike Trautmann","doi":"10.1162/evco_a_00341","DOIUrl":"10.1162/evco_a_00341","url":null,"abstract":"The herein proposed Python package pflacco provides a set of numerical features to characterize single-objective continuous and constrained optimization problems. Thereby, pflacco addresses two major challenges in the area of optimization. Firstly, it provides the means to develop an understanding of a given problem instance, which is crucial for designing, selecting, or configuring optimization algorithms in general. Secondly, these numerical features can be utilized in the research streams of automated algorithm selection and configuration. While the majority of these landscape features are already available in the R package flacco, our Python implementation offers these tools to an even wider audience and thereby promotes research interests and novel avenues in the area of optimization.","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":"32 3","pages":"211-216"},"PeriodicalIF":4.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9867698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Kostovska, Diederick Vermetten, Peter Korošec, Sašo Džeroski, Carola Doerr, Tome Eftimov
{"title":"Using Machine Learning Methods to Assess Module Performance Contribution in Modular Optimization Frameworks.","authors":"Ana Kostovska, Diederick Vermetten, Peter Korošec, Sašo Džeroski, Carola Doerr, Tome Eftimov","doi":"10.1162/evco_a_00356","DOIUrl":"https://doi.org/10.1162/evco_a_00356","url":null,"abstract":"<p><p>Modular algorithm frameworks not only allow for combinations never tested in manually selected algorithm portfolios, but they also provide a structured approach to assess which algorithmic ideas are crucial for the observed performance of algorithms. In this study, we propose a methodology for analyzing the impact of the different modules on the overall performance. We consider modular frameworks for two widely used families of derivative-free black-box optimization algorithms, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and differential evolution (DE). More specifically, we use performance data of 324 modCMA-ES and 576 modDE algorithm variants (with each variant corresponding to a specific configuration of modules) obtained on the 24 BBOB problems for 6 different runtime budgets in 2 dimensions. Our analysis of these data reveals that the impact of individual modules on overall algorithm performance varies significantly. Notably, among the examined modules, the elitism module in CMA-ES and the linear population size reduction module in DE exhibit the most significant impact on performance. Furthermore, our exploratory data analysis of problem landscape data suggests that the most relevant landscape features remain consistent regardless of the configuration of individual modules, but the influence that these features have on regression accuracy varies. In addition, we apply classifiers that exploit feature importance with respect to the trained models for performance prediction and performance data, to predict the modular configurations of CMA-ES and DE algorithm variants. The results show that the predicted configurations do not exhibit a statistically significant difference in performance compared to the true configurations, with the percentage varying depending on the setup (from 49.1% to 95.5% for mod-CMA and 21.7% to 77.1% for DE).</p>","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":" ","pages":"1-27"},"PeriodicalIF":4.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141890747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoran Gu, Handing Wang, Cheng He, Bo Yuan, Yaochu Jin
{"title":"Large-Scale Multiobjective Evolutionary Algorithm Guided by Low-Dimensional Surrogates of Scalarization Functions.","authors":"Haoran Gu, Handing Wang, Cheng He, Bo Yuan, Yaochu Jin","doi":"10.1162/evco_a_00354","DOIUrl":"https://doi.org/10.1162/evco_a_00354","url":null,"abstract":"<p><p>Recently, computationally intensive multiobjective optimization problems have been efficiently solved by surrogate-assisted multiobjective evolutionary algorithms. However, most of those algorithms can only handle no more than 200 decision variables. As the number of decision variables increases further, unreliable surrogate models will result in a dramatic deterioration of their performance, which makes large-scale expensive multiobjective optimization challenging. To address this challenge, we develop a large-scale multiobjective evolutionary algorithm guided by low-dimensional surrogate models of scalarization functions. The proposed algorithm (termed LDS-AF) reduces the dimension of the original decision space based on principal component analysis, and then directly approximates the scalarization functions in a decompositionbased multiobjective evolutionary algorithm. With the help of a two-stage modeling strategy and convergence control strategy, LDS-AF can keep a good balance between convergence and diversity, and achieve a promising performance without being trapped in a local optimum prematurely. The experimental results on a set of test instances have demonstrated its superiority over eight state-of-the-art algorithms on multiobjective optimization problems with up to 1000 decision variables using only 500 real function evaluations.</p>","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":" ","pages":"1-25"},"PeriodicalIF":6.8,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}