基于滑动窗口采样的高冗余流特征在线选择

Dianlong You, Xindong Wu, Limin Shen, Zhen Chen, Chuan Ma, Song Deng
{"title":"基于滑动窗口采样的高冗余流特征在线选择","authors":"Dianlong You, Xindong Wu, Limin Shen, Zhen Chen, Chuan Ma, Song Deng","doi":"10.1109/ICBK.2018.00035","DOIUrl":null,"url":null,"abstract":"In recent years, online feature selection has received much attention in data mining with the aim to reduce dimensionality of streaming features by removing irrelevant and redundant features in a real time manner. The existing works, such as Alpha-investing, OSFS, and SAOLA have been proposed to serve this purpose but have drawbacks e.g. low predication accuracy, and more numbers of selected features, streaming features can overflow when the streaming features they have high relevance to each other. In this paper, we propose an online learning algorithm, named OSFSW, with a sliding-window strategy to real-time sample streaming features, by the analysis of conditional independence to discard irrelevant and redundant features with the aim to overcome such drawbacks. Through OSFSW, we can get an approximate Markov blanket in a smaller number of selected features with high prediction accuracy. To validate the efficiency, we implement the proposed algorithm and test its performance on a prevalent dataset, i.e., NIPS 2003, and Causality Workbench. Through extensive experimental results, we demonstrate that OSFSW has a significant performance improvement on prediction accuracy and smaller numbers of selected features when comparing to Alpha-investing, OSFS and SAOLA.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Online Feature Selection for Streaming Features with High Redundancy Using Sliding-Window Sampling\",\"authors\":\"Dianlong You, Xindong Wu, Limin Shen, Zhen Chen, Chuan Ma, Song Deng\",\"doi\":\"10.1109/ICBK.2018.00035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, online feature selection has received much attention in data mining with the aim to reduce dimensionality of streaming features by removing irrelevant and redundant features in a real time manner. The existing works, such as Alpha-investing, OSFS, and SAOLA have been proposed to serve this purpose but have drawbacks e.g. low predication accuracy, and more numbers of selected features, streaming features can overflow when the streaming features they have high relevance to each other. In this paper, we propose an online learning algorithm, named OSFSW, with a sliding-window strategy to real-time sample streaming features, by the analysis of conditional independence to discard irrelevant and redundant features with the aim to overcome such drawbacks. Through OSFSW, we can get an approximate Markov blanket in a smaller number of selected features with high prediction accuracy. To validate the efficiency, we implement the proposed algorithm and test its performance on a prevalent dataset, i.e., NIPS 2003, and Causality Workbench. Through extensive experimental results, we demonstrate that OSFSW has a significant performance improvement on prediction accuracy and smaller numbers of selected features when comparing to Alpha-investing, OSFS and SAOLA.\",\"PeriodicalId\":144958,\"journal\":{\"name\":\"2018 IEEE International Conference on Big Knowledge (ICBK)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Big Knowledge (ICBK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBK.2018.00035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2018.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

近年来,在线特征选择在数据挖掘中备受关注,其目的是通过实时去除不相关和冗余的特征来降低流特征的维数。现有的工作,如Alpha-investing, OSFS和SAOLA已经提出了这一目的,但存在一些缺点,例如预测精度低,选择的特征数量较多,当流特征彼此具有高相关性时,流特征可能会溢出。在本文中,我们提出了一种名为OSFSW的在线学习算法,该算法采用滑动窗口策略来实时采样流特征,通过分析条件独立性来丢弃不相关和冗余的特征,以克服这些缺点。通过OSFSW,我们可以在较少数量的选择特征中得到近似的马尔可夫毯,并且预测精度较高。为了验证效率,我们实现了所提出的算法,并在流行的数据集(即NIPS 2003和Causality Workbench)上测试了其性能。通过大量的实验结果,我们证明与Alpha-investing、OSFS和SAOLA相比,OSFSW在预测精度和选择特征数量上有显著的性能提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Online Feature Selection for Streaming Features with High Redundancy Using Sliding-Window Sampling
In recent years, online feature selection has received much attention in data mining with the aim to reduce dimensionality of streaming features by removing irrelevant and redundant features in a real time manner. The existing works, such as Alpha-investing, OSFS, and SAOLA have been proposed to serve this purpose but have drawbacks e.g. low predication accuracy, and more numbers of selected features, streaming features can overflow when the streaming features they have high relevance to each other. In this paper, we propose an online learning algorithm, named OSFSW, with a sliding-window strategy to real-time sample streaming features, by the analysis of conditional independence to discard irrelevant and redundant features with the aim to overcome such drawbacks. Through OSFSW, we can get an approximate Markov blanket in a smaller number of selected features with high prediction accuracy. To validate the efficiency, we implement the proposed algorithm and test its performance on a prevalent dataset, i.e., NIPS 2003, and Causality Workbench. Through extensive experimental results, we demonstrate that OSFSW has a significant performance improvement on prediction accuracy and smaller numbers of selected features when comparing to Alpha-investing, OSFS and SAOLA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信