Boosting Initial Population in Multiobjective Feature Selection with Knowledge-Based Partitioning

2022 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2022-07-18 DOI:10.1109/IJCNN55064.2022.9892123

Ayça Deniz, Hakan Ezgi Kiziloz

{"title":"Boosting Initial Population in Multiobjective Feature Selection with Knowledge-Based Partitioning","authors":"Ayça Deniz, Hakan Ezgi Kiziloz","doi":"10.1109/IJCNN55064.2022.9892123","DOIUrl":null,"url":null,"abstract":"The quality of features is one of the main factors that affect classification performance. Feature selection aims to remove irrelevant and redundant features from data in order to increase classification accuracy. However, identifying these features is not a trivial task due to a large search space. Evolutionary algorithms have been proven to be effective in many optimization problems, including feature selection. These algorithms require an initial population to start their search mechanism, and a poor initial population may cause getting stuck in local optima. Diversifying the initial population is known as an effective approach to overcome this issue; yet, it may not suffice as the search space grows exponentially with increasing feature sizes. In this study, we propose an enhanced initial population strategy to boost the performance of the feature selection task. In our proposed method, we ensure the diversity of the initial population by partitioning the candidate solutions according to their selected number of features. In addition, we adjust the chances of features being selected into a candidate solution regarding their information gain values, which enables wise selection of features among a vast search space. We conduct extensive experiments on many benchmark datasets retrieved from UCI Machine Learning Repository. Moreover, we apply our algorithm on a real-world, large-scale dataset, i.e., Stanford Sentiment Treebank. We observe significant improvements after the comparisons with three off-the-shelf initialization strategies.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The quality of features is one of the main factors that affect classification performance. Feature selection aims to remove irrelevant and redundant features from data in order to increase classification accuracy. However, identifying these features is not a trivial task due to a large search space. Evolutionary algorithms have been proven to be effective in many optimization problems, including feature selection. These algorithms require an initial population to start their search mechanism, and a poor initial population may cause getting stuck in local optima. Diversifying the initial population is known as an effective approach to overcome this issue; yet, it may not suffice as the search space grows exponentially with increasing feature sizes. In this study, we propose an enhanced initial population strategy to boost the performance of the feature selection task. In our proposed method, we ensure the diversity of the initial population by partitioning the candidate solutions according to their selected number of features. In addition, we adjust the chances of features being selected into a candidate solution regarding their information gain values, which enables wise selection of features among a vast search space. We conduct extensive experiments on many benchmark datasets retrieved from UCI Machine Learning Repository. Moreover, we apply our algorithm on a real-world, large-scale dataset, i.e., Stanford Sentiment Treebank. We observe significant improvements after the comparisons with three off-the-shelf initialization strategies.

查看原文本刊更多论文

基于知识划分的多目标特征选择初始种群提升

特征的质量是影响分类性能的主要因素之一。特征选择旨在从数据中去除不相关和冗余的特征，以提高分类精度。然而，由于搜索空间很大，识别这些特性并不是一项简单的任务。进化算法已被证明在许多优化问题中是有效的，包括特征选择。这些算法需要初始种群来启动它们的搜索机制，而糟糕的初始种群可能会导致陷入局部最优状态。使初始种群多样化是克服这一问题的有效方法;然而，随着特征尺寸的增加，搜索空间呈指数级增长，这可能还不够。在本研究中，我们提出了一种增强的初始种群策略来提高特征选择任务的性能。在我们提出的方法中，我们通过根据所选择的特征数量划分候选解来确保初始总体的多样性。此外，我们根据特征的信息增益值调整特征被选择到候选解决方案的机会，从而能够在巨大的搜索空间中明智地选择特征。我们对从UCI机器学习存储库检索的许多基准数据集进行了广泛的实验。此外，我们将算法应用于真实世界的大规模数据集，即斯坦福情感树库。在与三种现成的初始化策略进行比较后，我们观察到显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量