Boosting Initial Population in Multiobjective Feature Selection with Knowledge-Based Partitioning

Ayça Deniz, Hakan Ezgi Kiziloz
{"title":"Boosting Initial Population in Multiobjective Feature Selection with Knowledge-Based Partitioning","authors":"Ayça Deniz, Hakan Ezgi Kiziloz","doi":"10.1109/IJCNN55064.2022.9892123","DOIUrl":null,"url":null,"abstract":"The quality of features is one of the main factors that affect classification performance. Feature selection aims to remove irrelevant and redundant features from data in order to increase classification accuracy. However, identifying these features is not a trivial task due to a large search space. Evolutionary algorithms have been proven to be effective in many optimization problems, including feature selection. These algorithms require an initial population to start their search mechanism, and a poor initial population may cause getting stuck in local optima. Diversifying the initial population is known as an effective approach to overcome this issue; yet, it may not suffice as the search space grows exponentially with increasing feature sizes. In this study, we propose an enhanced initial population strategy to boost the performance of the feature selection task. In our proposed method, we ensure the diversity of the initial population by partitioning the candidate solutions according to their selected number of features. In addition, we adjust the chances of features being selected into a candidate solution regarding their information gain values, which enables wise selection of features among a vast search space. We conduct extensive experiments on many benchmark datasets retrieved from UCI Machine Learning Repository. Moreover, we apply our algorithm on a real-world, large-scale dataset, i.e., Stanford Sentiment Treebank. We observe significant improvements after the comparisons with three off-the-shelf initialization strategies.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The quality of features is one of the main factors that affect classification performance. Feature selection aims to remove irrelevant and redundant features from data in order to increase classification accuracy. However, identifying these features is not a trivial task due to a large search space. Evolutionary algorithms have been proven to be effective in many optimization problems, including feature selection. These algorithms require an initial population to start their search mechanism, and a poor initial population may cause getting stuck in local optima. Diversifying the initial population is known as an effective approach to overcome this issue; yet, it may not suffice as the search space grows exponentially with increasing feature sizes. In this study, we propose an enhanced initial population strategy to boost the performance of the feature selection task. In our proposed method, we ensure the diversity of the initial population by partitioning the candidate solutions according to their selected number of features. In addition, we adjust the chances of features being selected into a candidate solution regarding their information gain values, which enables wise selection of features among a vast search space. We conduct extensive experiments on many benchmark datasets retrieved from UCI Machine Learning Repository. Moreover, we apply our algorithm on a real-world, large-scale dataset, i.e., Stanford Sentiment Treebank. We observe significant improvements after the comparisons with three off-the-shelf initialization strategies.
基于知识划分的多目标特征选择初始种群提升
特征的质量是影响分类性能的主要因素之一。特征选择旨在从数据中去除不相关和冗余的特征,以提高分类精度。然而,由于搜索空间很大,识别这些特性并不是一项简单的任务。进化算法已被证明在许多优化问题中是有效的,包括特征选择。这些算法需要初始种群来启动它们的搜索机制,而糟糕的初始种群可能会导致陷入局部最优状态。使初始种群多样化是克服这一问题的有效方法;然而,随着特征尺寸的增加,搜索空间呈指数级增长,这可能还不够。在本研究中,我们提出了一种增强的初始种群策略来提高特征选择任务的性能。在我们提出的方法中,我们通过根据所选择的特征数量划分候选解来确保初始总体的多样性。此外,我们根据特征的信息增益值调整特征被选择到候选解决方案的机会,从而能够在巨大的搜索空间中明智地选择特征。我们对从UCI机器学习存储库检索的许多基准数据集进行了广泛的实验。此外,我们将算法应用于真实世界的大规模数据集,即斯坦福情感树库。在与三种现成的初始化策略进行比较后,我们观察到显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信