一种新的图特征选择方法

2020 6th IEEE Congress on Information Science and Technology (CiSt) Pub Date : 2020-06-05 DOI:10.1109/CiSt49399.2021.9357067

Yassine Akhiat, Youssef Asnaoui, M. Chahhou, Ahmed Zinedine

{"title":"一种新的图特征选择方法","authors":"Yassine Akhiat, Youssef Asnaoui, M. Chahhou, Ahmed Zinedine","doi":"10.1109/CiSt49399.2021.9357067","DOIUrl":null,"url":null,"abstract":"Feature selection (FS) is a very important pre-processing technique in machine learning and data mining. It aims to select a small subset of relevant and informative features from the original feature space that may contain many irrelevant, redundant and noisy features. Feature selection usually leads to better performance, interpretability, and lower computational cost. In the literature, FS methods are categorized into three main approaches: Filters, Wrappers, and Embedded. In this paper we introduce a new feature selection method called graph feature selection (GFS). The main steps of GFS are the following: first, we create a weighted graph where each node corresponds to each feature and the weight between two nodes is computed using a matrix of individual and pairwise score of a Decision tree classifier. Second, at each iteration, we split the graph into two random partitions having the same number of nodes, then we keep moving the worst node from one partition to another until the global modularity is converged. Third, from the final best partition, we select the best ranked features according to a new proposed variable importance criterion. The results of GFS are compared to three well-known feature selection algorithms using nine benchmarking datasets. The proposed method shows its ability and effectiveness at identifying the most informative feature subset.","PeriodicalId":253233,"journal":{"name":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A new graph feature selection approach\",\"authors\":\"Yassine Akhiat, Youssef Asnaoui, M. Chahhou, Ahmed Zinedine\",\"doi\":\"10.1109/CiSt49399.2021.9357067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection (FS) is a very important pre-processing technique in machine learning and data mining. It aims to select a small subset of relevant and informative features from the original feature space that may contain many irrelevant, redundant and noisy features. Feature selection usually leads to better performance, interpretability, and lower computational cost. In the literature, FS methods are categorized into three main approaches: Filters, Wrappers, and Embedded. In this paper we introduce a new feature selection method called graph feature selection (GFS). The main steps of GFS are the following: first, we create a weighted graph where each node corresponds to each feature and the weight between two nodes is computed using a matrix of individual and pairwise score of a Decision tree classifier. Second, at each iteration, we split the graph into two random partitions having the same number of nodes, then we keep moving the worst node from one partition to another until the global modularity is converged. Third, from the final best partition, we select the best ranked features according to a new proposed variable importance criterion. The results of GFS are compared to three well-known feature selection algorithms using nine benchmarking datasets. The proposed method shows its ability and effectiveness at identifying the most informative feature subset.\",\"PeriodicalId\":253233,\"journal\":{\"name\":\"2020 6th IEEE Congress on Information Science and Technology (CiSt)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th IEEE Congress on Information Science and Technology (CiSt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CiSt49399.2021.9357067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CiSt49399.2021.9357067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

特征选择(FS)是机器学习和数据挖掘中非常重要的预处理技术。它旨在从可能包含许多不相关、冗余和噪声特征的原始特征空间中选择一小部分相关且信息丰富的特征。特征选择通常会带来更好的性能、可解释性和更低的计算成本。在文献中，FS方法被分为三种主要方法:过滤器、包装器和嵌入式。本文提出了一种新的特征选择方法——图特征选择(GFS)。GFS的主要步骤如下:首先，我们创建一个加权图，其中每个节点对应于每个特征，并且使用决策树分类器的单个和成对得分矩阵计算两个节点之间的权重。其次，在每次迭代中，我们将图分成具有相同节点数量的两个随机分区，然后将最差节点从一个分区移动到另一个分区，直到全局模块化收敛。第三，从最终的最佳划分中，根据新提出的变量重要度标准选择最佳排序特征。使用9个基准数据集，将GFS的结果与三种知名的特征选择算法进行了比较。该方法在识别信息量最大的特征子集方面表现出了良好的能力和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new graph feature selection approach

Feature selection (FS) is a very important pre-processing technique in machine learning and data mining. It aims to select a small subset of relevant and informative features from the original feature space that may contain many irrelevant, redundant and noisy features. Feature selection usually leads to better performance, interpretability, and lower computational cost. In the literature, FS methods are categorized into three main approaches: Filters, Wrappers, and Embedded. In this paper we introduce a new feature selection method called graph feature selection (GFS). The main steps of GFS are the following: first, we create a weighted graph where each node corresponds to each feature and the weight between two nodes is computed using a matrix of individual and pairwise score of a Decision tree classifier. Second, at each iteration, we split the graph into two random partitions having the same number of nodes, then we keep moving the worst node from one partition to another until the global modularity is converged. Third, from the final best partition, we select the best ranked features according to a new proposed variable importance criterion. The results of GFS are compared to three well-known feature selection algorithms using nine benchmarking datasets. The proposed method shows its ability and effectiveness at identifying the most informative feature subset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 6th IEEE Congress on Information Science and Technology (CiSt)

自引率

0.00%

发文量