Online Streaming Feature Selection Based on Feature Interaction

Yan Lv, Yaojin Lin, Xiangyan Chen, Dongxing Wang, Chenxi Wang
{"title":"Online Streaming Feature Selection Based on Feature Interaction","authors":"Yan Lv, Yaojin Lin, Xiangyan Chen, Dongxing Wang, Chenxi Wang","doi":"10.1109/ICBK50248.2020.00017","DOIUrl":null,"url":null,"abstract":"In many big data applications, online streaming feature selection plays a critical role in processing feature stream and dealing with high-dimensional problems. However, traditional online streaming feature selection methods focus on relevant features, irrelevant and/or redundant features, ignore the interaction between features. i.e., individual feature and label are irrelevant or weakly correlated, but when it is combined with another irrelevant or weakly feature, they show strongly correlated with label. In this paper, we propose a novel feature selection algorithm that considers feature interaction based on neighborhood rough set. This algorithm select features based on the following principles: the discrimination capability of the selected feature subset should be greater than or equal to the original feature space, and the number of features subset should be as small as possible by using feature interaction. Under this framework, we propose an online significance analysis criterion to select significance features relative to the currently selected features, and design an online redundancy analysis criterion to retain highly interactive features and filter out redundant features. Experimental results on a series of benchmark datasets show that the proposed algorithm significantly outperforms other state-of-the-art online streaming feature selection methods.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In many big data applications, online streaming feature selection plays a critical role in processing feature stream and dealing with high-dimensional problems. However, traditional online streaming feature selection methods focus on relevant features, irrelevant and/or redundant features, ignore the interaction between features. i.e., individual feature and label are irrelevant or weakly correlated, but when it is combined with another irrelevant or weakly feature, they show strongly correlated with label. In this paper, we propose a novel feature selection algorithm that considers feature interaction based on neighborhood rough set. This algorithm select features based on the following principles: the discrimination capability of the selected feature subset should be greater than or equal to the original feature space, and the number of features subset should be as small as possible by using feature interaction. Under this framework, we propose an online significance analysis criterion to select significance features relative to the currently selected features, and design an online redundancy analysis criterion to retain highly interactive features and filter out redundant features. Experimental results on a series of benchmark datasets show that the proposed algorithm significantly outperforms other state-of-the-art online streaming feature selection methods.
基于特征交互的在线流媒体特征选择
在许多大数据应用中,在线流特征选择在处理特征流和处理高维问题中起着至关重要的作用。然而,传统的在线流媒体特征选择方法侧重于相关特征、不相关特征和冗余特征,忽略了特征之间的交互作用。即单个特征与标签是不相关或弱相关的,但当它与另一个不相关或弱的特征结合在一起时,它们与标签表现出强相关。本文提出了一种考虑特征交互的基于邻域粗糙集的特征选择算法。该算法根据以下原则选择特征:所选特征子集的识别能力应大于或等于原始特征空间,并且通过特征交互使特征子集的数量尽可能少。在此框架下,我们提出了一种在线显著性分析准则来选择相对于当前选定特征的显著性特征,并设计了一种在线冗余分析准则来保留高交互性特征并过滤冗余特征。在一系列基准数据集上的实验结果表明,该算法明显优于其他最先进的在线流特征选择方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信