A Comparative Study of the Stability of Filter based Feature Selection Algorithms

Rikta Sen, A. K. Mandal, Saptarsi Goswami, B. Chakraborty
{"title":"A Comparative Study of the Stability of Filter based Feature Selection Algorithms","authors":"Rikta Sen, A. K. Mandal, Saptarsi Goswami, B. Chakraborty","doi":"10.1109/ICAwST.2019.8923245","DOIUrl":null,"url":null,"abstract":"Feature selection is an important step prior to classification stage of machine learning, pattern recognition and data mining problems for addressing the high dimensionality of the data. It removes irrelevant and redundant features which lead to simplify classification process and improve accuracy. Several feature selection algorithms have been proposed so far and quality of the selected feature subset varies from algorithm to algorithm. One of the measures for assessing the quality of a feature selection algorithm is its stability. Stability refers to the robustness of the selected feature set to small changes in the training set or set of various parameters of the algorithm. In this work, a comparative study of the stability of several well-known filter based feature selection algorithms, producing ranked feature sub set, has been done. Fifteen benchmark datasets from the UCI repository have been used for simulation experiments. Three types of stability measures, index-based, rank-based and weight based are used to evaluate the stability of feature selection algorithms. Simulation results demonstrate that for most of the datasets, JMD-based feature selection algorithm exhibits more stability irrespective of all types of stability measures. It is also observed that Relief shows the least stability.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"8 36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Feature selection is an important step prior to classification stage of machine learning, pattern recognition and data mining problems for addressing the high dimensionality of the data. It removes irrelevant and redundant features which lead to simplify classification process and improve accuracy. Several feature selection algorithms have been proposed so far and quality of the selected feature subset varies from algorithm to algorithm. One of the measures for assessing the quality of a feature selection algorithm is its stability. Stability refers to the robustness of the selected feature set to small changes in the training set or set of various parameters of the algorithm. In this work, a comparative study of the stability of several well-known filter based feature selection algorithms, producing ranked feature sub set, has been done. Fifteen benchmark datasets from the UCI repository have been used for simulation experiments. Three types of stability measures, index-based, rank-based and weight based are used to evaluate the stability of feature selection algorithms. Simulation results demonstrate that for most of the datasets, JMD-based feature selection algorithm exhibits more stability irrespective of all types of stability measures. It is also observed that Relief shows the least stability.
基于滤波器的特征选择算法稳定性的比较研究
特征选择是机器学习、模式识别和数据挖掘问题分类阶段之前的重要步骤,用于解决数据的高维性。它去除了不相关和冗余的特征,从而简化了分类过程,提高了分类精度。到目前为止,已经提出了几种特征选择算法,所选择的特征子集的质量因算法而异。评价特征选择算法质量的指标之一是其稳定性。稳定性是指所选择的特征集对于训练集或算法的各种参数集合的微小变化的鲁棒性。在本工作中,对几种著名的基于滤波器的特征选择算法的稳定性进行了比较研究,生成了排序特征子集。来自UCI存储库的15个基准数据集已用于模拟实验。采用基于指标的、基于等级的和基于权值的三种稳定性度量来评价特征选择算法的稳定性。仿真结果表明,对于大多数数据集,无论采用何种稳定性措施,基于jmd的特征选择算法都表现出更高的稳定性。还观察到,救济表现出最低的稳定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信