Parallel fractional dominance MOEAs for feature subset selection in big data

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2024-09-03 DOI:10.1016/j.swevo.2024.101687

Yelleti Vivek , Vadlamani Ravi , Ponnuthurai Nagaratnam Suganthan , P. Radha Krishna

{"title":"Parallel fractional dominance MOEAs for feature subset selection in big data","authors":"Yelleti Vivek , Vadlamani Ravi , Ponnuthurai Nagaratnam Suganthan , P. Radha Krishna","doi":"10.1016/j.swevo.2024.101687","DOIUrl":null,"url":null,"abstract":"<div><p>In this paper, we solve the feature subset selection (FSS) problem with three objective functions namely, cardinality, area under receiver operating characteristic curve (AUC) and Matthews correlation coefficient (MCC) using novel multi-objective evolutionary algorithms (MOEAs). MOEAs often encounter poor convergence due to the increase in non-dominated solutions and getting entrapped in the local optima. This situation worsens when dealing with large, voluminous big and high-dimensional datasets. To address these challenges, we propose parallel, fractional dominance-based MOEAs for FSS under Spark. Further, to improve the exploitation of MOEAs, we introduce a novel batch opposition-based learning (BOP) along with a cardinality constraint on the opposite solution. Accordingly, we propose two variants, namely, BOP1 and BOP2. In BOP1, a single neighbour is randomly chosen in the opposite solution space, whereas in BOP2, a group of randomly chosen neighbours in the opposite solution space. In either case, the opposite solutions are evaluated to improve the exploitation capability of the underlying MOEAs. We observe that in terms of mean optimal objective function values and across all datasets, the proposed BOP2 variant of parallel fractional dominance-based algorithms emerges as the top performer in obtaining efficient solutions. Further, we introduce a novel metric, namely the ratio of hypervolume (HV) and inverted generated distance (IGD), HV/IGD, that combines both diversity and convergence. With respect to the mean HV/IGD computed over 20 runs and Formula 1 racing, the BOP1 variants of fractional dominance-based MOEAs outperformed other algorithms.</p></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"91 ","pages":"Article 101687"},"PeriodicalIF":8.2000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650224002256","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we solve the feature subset selection (FSS) problem with three objective functions namely, cardinality, area under receiver operating characteristic curve (AUC) and Matthews correlation coefficient (MCC) using novel multi-objective evolutionary algorithms (MOEAs). MOEAs often encounter poor convergence due to the increase in non-dominated solutions and getting entrapped in the local optima. This situation worsens when dealing with large, voluminous big and high-dimensional datasets. To address these challenges, we propose parallel, fractional dominance-based MOEAs for FSS under Spark. Further, to improve the exploitation of MOEAs, we introduce a novel batch opposition-based learning (BOP) along with a cardinality constraint on the opposite solution. Accordingly, we propose two variants, namely, BOP1 and BOP2. In BOP1, a single neighbour is randomly chosen in the opposite solution space, whereas in BOP2, a group of randomly chosen neighbours in the opposite solution space. In either case, the opposite solutions are evaluated to improve the exploitation capability of the underlying MOEAs. We observe that in terms of mean optimal objective function values and across all datasets, the proposed BOP2 variant of parallel fractional dominance-based algorithms emerges as the top performer in obtaining efficient solutions. Further, we introduce a novel metric, namely the ratio of hypervolume (HV) and inverted generated distance (IGD), HV/IGD, that combines both diversity and convergence. With respect to the mean HV/IGD computed over 20 runs and Formula 1 racing, the BOP1 variants of fractional dominance-based MOEAs outperformed other algorithms.

查看原文本刊更多论文

用于大数据特征子集选择的并行分数优势 MOEAs

本文利用新型多目标进化算法（MOEAs）解决了具有三个目标函数的特征子集选择（FSS）问题，这三个目标函数是：卡数、接收器工作特征曲线下面积（AUC）和马修斯相关系数（MCC）。由于非优势解的增加和陷入局部最优状态，MOEA 经常遇到收敛性差的问题。在处理大型、海量和高维数据集时，这种情况会更加严重。为了应对这些挑战，我们针对 Spark 下的 FSS 提出了基于分数优势的并行 MOEA。此外，为了提高 MOEAs 的利用率，我们引入了一种新颖的基于对立的批量学习（BOP），并在对立解中加入了卡定约束。因此，我们提出了两种变体，即 BOP1 和 BOP2。在 BOP1 中，在相反解空间中随机选择一个邻居，而在 BOP2 中，在相反解空间中随机选择一组邻居。无论在哪种情况下，都会对相反解进行评估，以提高基础 MOEA 的利用能力。我们发现，就平均最优目标函数值而言，在所有数据集中，基于并行分数优势算法的 BOP2 变体在获得高效解决方案方面表现最佳。此外，我们还引入了一个新的指标，即超体积（HV）与反向生成距离（IGD）之比，HV/IGD，它将多样性和收敛性结合在一起。根据 20 次运行和一级方程式赛车计算得出的平均 HV/IGD 值，基于分数优势的 MOEAs 的 BOP1 变体优于其他算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.