Towards Scalable and Accurate Online Feature Selection for Big Data

Kui Yu, Xindong Wu, W. Ding, J. Pei
{"title":"Towards Scalable and Accurate Online Feature Selection for Big Data","authors":"Kui Yu, Xindong Wu, W. Ding, J. Pei","doi":"10.1145/2976744","DOIUrl":null,"url":null,"abstract":"Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"143","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2976744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 143

Abstract

Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.
面向大数据的可扩展、准确的在线特征选择
特征选择在许多大数据应用中非常重要。至少有两个关键的挑战。首先,在许多应用中,维数非常高,以百万计,并且还在不断增长。其次,特征选择必须是高度可扩展的,最好是在线方式,这样每个特征都可以在顺序扫描中处理。在本文中,我们开发了一种可扩展的、精确的在线特征选择方法SAOLA。通过对当前选择的特征子集中特征之间的对明智相关性的下界的理论分析,SAOLA采用新颖的在线对明智比较技术来解决这两个挑战,并以在线方式随时间保持简约的模型。使用一系列基准真实数据集的实证研究表明,SAOLA在极高维度的数据集上具有可扩展性,并且比目前最先进的特征选择方法具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信