Jonathan Estrella-Ramírez , Jorge de la Calleja , Juan Carlos Gómez Carranza
{"title":"Bi-objective evolutionary hyper-heuristics in automated machine learning for text classification tasks","authors":"Jonathan Estrella-Ramírez , Jorge de la Calleja , Juan Carlos Gómez Carranza","doi":"10.1016/j.swevo.2025.102073","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes an evolutionary model based on hyper-heuristics to automate the selection of classification methods for text datasets under a bi-objective approach. The model has three nested levels. At the first level, individual methods classify datasets, recording two performances: the number of misclassifications and computational time, which are often in conflict. At the second level, hyper-heuristics, as a set of rules of the form <span><math><mrow><mi>i</mi><mi>f</mi><mo>→</mo><mi>t</mi><mi>h</mi><mi>e</mi><mi>n</mi></mrow></math></span>, select classification methods for datasets based on 16 meta-features representing the data distribution. The fitness for a hyper-heuristic is evaluated on a training group of datasets by aggregating the two low-level performances of the chosen methods. At the third level, the multi-objective evolutionary algorithm Strength Pareto Evolutionary Algorithm 2 evolves hyper-heuristic populations considering the bi-objective of minimizing the two aggregated performances. The result is a Pareto-approximated front of hyper-heuristics, which offers a range of solutions from computationally efficient to high classification performance. Finally, the model evaluates the front with an independent test group of datasets and selects those hyper-heuristics that are not dominated. We evaluated the resulting fronts through extensive experiments, measuring several quality indicators. We compare the model’s fronts with a front baseline consisting of non-dominated individual classification methods and four state-of-the-art automated machine learning tools (AutoKeras, AutoGluon, H2O, and TPOT). The proposed model yields larger, more diverse Pareto-approximated fronts that outperform the baseline front, allowing solution selection based on available resources and trade-offs between performance and cost.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"98 ","pages":"Article 102073"},"PeriodicalIF":8.2000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002317","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes an evolutionary model based on hyper-heuristics to automate the selection of classification methods for text datasets under a bi-objective approach. The model has three nested levels. At the first level, individual methods classify datasets, recording two performances: the number of misclassifications and computational time, which are often in conflict. At the second level, hyper-heuristics, as a set of rules of the form , select classification methods for datasets based on 16 meta-features representing the data distribution. The fitness for a hyper-heuristic is evaluated on a training group of datasets by aggregating the two low-level performances of the chosen methods. At the third level, the multi-objective evolutionary algorithm Strength Pareto Evolutionary Algorithm 2 evolves hyper-heuristic populations considering the bi-objective of minimizing the two aggregated performances. The result is a Pareto-approximated front of hyper-heuristics, which offers a range of solutions from computationally efficient to high classification performance. Finally, the model evaluates the front with an independent test group of datasets and selects those hyper-heuristics that are not dominated. We evaluated the resulting fronts through extensive experiments, measuring several quality indicators. We compare the model’s fronts with a front baseline consisting of non-dominated individual classification methods and four state-of-the-art automated machine learning tools (AutoKeras, AutoGluon, H2O, and TPOT). The proposed model yields larger, more diverse Pareto-approximated fronts that outperform the baseline front, allowing solution selection based on available resources and trade-offs between performance and cost.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.