Bi-objective evolutionary hyper-heuristics in automated machine learning for text classification tasks

IF 8.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jonathan Estrella-Ramírez , Jorge de la Calleja , Juan Carlos Gómez Carranza
{"title":"Bi-objective evolutionary hyper-heuristics in automated machine learning for text classification tasks","authors":"Jonathan Estrella-Ramírez ,&nbsp;Jorge de la Calleja ,&nbsp;Juan Carlos Gómez Carranza","doi":"10.1016/j.swevo.2025.102073","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes an evolutionary model based on hyper-heuristics to automate the selection of classification methods for text datasets under a bi-objective approach. The model has three nested levels. At the first level, individual methods classify datasets, recording two performances: the number of misclassifications and computational time, which are often in conflict. At the second level, hyper-heuristics, as a set of rules of the form <span><math><mrow><mi>i</mi><mi>f</mi><mo>→</mo><mi>t</mi><mi>h</mi><mi>e</mi><mi>n</mi></mrow></math></span>, select classification methods for datasets based on 16 meta-features representing the data distribution. The fitness for a hyper-heuristic is evaluated on a training group of datasets by aggregating the two low-level performances of the chosen methods. At the third level, the multi-objective evolutionary algorithm Strength Pareto Evolutionary Algorithm 2 evolves hyper-heuristic populations considering the bi-objective of minimizing the two aggregated performances. The result is a Pareto-approximated front of hyper-heuristics, which offers a range of solutions from computationally efficient to high classification performance. Finally, the model evaluates the front with an independent test group of datasets and selects those hyper-heuristics that are not dominated. We evaluated the resulting fronts through extensive experiments, measuring several quality indicators. We compare the model’s fronts with a front baseline consisting of non-dominated individual classification methods and four state-of-the-art automated machine learning tools (AutoKeras, AutoGluon, H2O, and TPOT). The proposed model yields larger, more diverse Pareto-approximated fronts that outperform the baseline front, allowing solution selection based on available resources and trade-offs between performance and cost.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"98 ","pages":"Article 102073"},"PeriodicalIF":8.2000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002317","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes an evolutionary model based on hyper-heuristics to automate the selection of classification methods for text datasets under a bi-objective approach. The model has three nested levels. At the first level, individual methods classify datasets, recording two performances: the number of misclassifications and computational time, which are often in conflict. At the second level, hyper-heuristics, as a set of rules of the form ifthen, select classification methods for datasets based on 16 meta-features representing the data distribution. The fitness for a hyper-heuristic is evaluated on a training group of datasets by aggregating the two low-level performances of the chosen methods. At the third level, the multi-objective evolutionary algorithm Strength Pareto Evolutionary Algorithm 2 evolves hyper-heuristic populations considering the bi-objective of minimizing the two aggregated performances. The result is a Pareto-approximated front of hyper-heuristics, which offers a range of solutions from computationally efficient to high classification performance. Finally, the model evaluates the front with an independent test group of datasets and selects those hyper-heuristics that are not dominated. We evaluated the resulting fronts through extensive experiments, measuring several quality indicators. We compare the model’s fronts with a front baseline consisting of non-dominated individual classification methods and four state-of-the-art automated machine learning tools (AutoKeras, AutoGluon, H2O, and TPOT). The proposed model yields larger, more diverse Pareto-approximated fronts that outperform the baseline front, allowing solution selection based on available resources and trade-offs between performance and cost.

Abstract Image

文本分类任务自动机器学习中的双目标进化超启发式
本文提出了一种基于超启发式的进化模型,在双目标方法下实现文本数据集分类方法的自动选择。该模型有三个嵌套的层次。在第一级,各个方法对数据集进行分类,记录两个性能:错误分类的数量和计算时间,这两个性能通常是相互冲突的。在第二层,超启发式,作为一组if→then形式的规则,根据代表数据分布的16个元特征为数据集选择分类方法。通过聚合所选方法的两个低级性能,在数据集训练组上评估超启发式的适合度。在第三层次,多目标进化算法强度帕累托进化算法2进化超启发式种群考虑最小化两个聚合性能的双目标。结果是超启发式的帕累托近似前沿,它提供了从计算效率到高分类性能的一系列解决方案。最后,该模型使用独立的数据集测试组评估前沿,并选择那些不占主导地位的超启发式。我们通过广泛的实验,测量了几个质量指标来评估结果。我们将模型的前沿与由非主导个体分类方法和四种最先进的自动化机器学习工具(AutoKeras, AutoGluon, H2O和TPOT)组成的前沿基线进行比较。所提出的模型产生更大、更多样化的pareto近似前沿,其性能优于基线前沿,允许基于可用资源和性能与成本之间的权衡来选择解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Swarm and Evolutionary Computation
Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
16.00
自引率
12.00%
发文量
169
期刊介绍: Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信