特征选择,面向大数据分类的在线特征选择技术综述

S. Devi, M. Sabrigiriraj
{"title":"特征选择,面向大数据分类的在线特征选择技术综述","authors":"S. Devi, M. Sabrigiriraj","doi":"10.1109/ICCTCT.2018.8550928","DOIUrl":null,"url":null,"abstract":"In the recent times, several disciplines have to tackle with huge datasets, which are involved with a huge number of additional features. Feature Selection (FS) techniques target at reducing the noisy, redundant, or unnecessary features, which might degrade the performance of classification. Although there is several numbers of FS techniques, still it remains an active research field among the data mining, machine learning and pattern recognition groups. Several FS techniques are imposed with critical issues with regards to efficiency and usefulness, due to rise in data dimensionality, which happens nowadays. Nonetheless, conventional techniques are deficit of sufficient scalability to deal with datasets consisting of millions of instances and obtain results with success in a less amount of time. Therefore, in this case, an Online Feature Selection (OFS) algorithm can yield a better solution for solving this issue. This work reviews few of the available and well-known FS, OFS techniques by pointing out the pros and cons of those techniques. This technical work studies the details of traditional FS and OFS techniques depending on evolutionary computation that is helpful in getting the subsets of features from huge datasets. As a result, this review also provides a summary, and analysis of machine learning algorithms for huge datasets. In addition, the new machine learning strategies and methodologies are explained with their capacity of dealing with the different challenges with the ultimate goal of assisting the practitioners in selecting the suitable solutions for their use cases. This review work renders a view on the big data domain, finds the research gaps and possibilities, and offers a solid foundation, assistance for more research in the machine learning field that uses big dataset.","PeriodicalId":344188,"journal":{"name":"2018 International Conference on Current Trends towards Converging Technologies (ICCTCT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Feature Selection, Online Feature Selection Techniques for Big Data Classification: - A Review\",\"authors\":\"S. Devi, M. Sabrigiriraj\",\"doi\":\"10.1109/ICCTCT.2018.8550928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the recent times, several disciplines have to tackle with huge datasets, which are involved with a huge number of additional features. Feature Selection (FS) techniques target at reducing the noisy, redundant, or unnecessary features, which might degrade the performance of classification. Although there is several numbers of FS techniques, still it remains an active research field among the data mining, machine learning and pattern recognition groups. Several FS techniques are imposed with critical issues with regards to efficiency and usefulness, due to rise in data dimensionality, which happens nowadays. Nonetheless, conventional techniques are deficit of sufficient scalability to deal with datasets consisting of millions of instances and obtain results with success in a less amount of time. Therefore, in this case, an Online Feature Selection (OFS) algorithm can yield a better solution for solving this issue. This work reviews few of the available and well-known FS, OFS techniques by pointing out the pros and cons of those techniques. This technical work studies the details of traditional FS and OFS techniques depending on evolutionary computation that is helpful in getting the subsets of features from huge datasets. As a result, this review also provides a summary, and analysis of machine learning algorithms for huge datasets. In addition, the new machine learning strategies and methodologies are explained with their capacity of dealing with the different challenges with the ultimate goal of assisting the practitioners in selecting the suitable solutions for their use cases. This review work renders a view on the big data domain, finds the research gaps and possibilities, and offers a solid foundation, assistance for more research in the machine learning field that uses big dataset.\",\"PeriodicalId\":344188,\"journal\":{\"name\":\"2018 International Conference on Current Trends towards Converging Technologies (ICCTCT)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Current Trends towards Converging Technologies (ICCTCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCTCT.2018.8550928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Current Trends towards Converging Technologies (ICCTCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCTCT.2018.8550928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

近年来,一些学科不得不处理庞大的数据集,这涉及到大量的附加特征。特征选择(FS)技术的目标是减少可能降低分类性能的噪声、冗余或不必要的特征。虽然FS技术有很多,但它仍然是数据挖掘、机器学习和模式识别领域的一个活跃研究领域。由于数据维度的增加,一些FS技术在效率和有用性方面存在关键问题,这是现在发生的事情。然而,传统技术缺乏足够的可扩展性来处理由数百万个实例组成的数据集,并在较短的时间内成功地获得结果。因此,在这种情况下,在线特征选择(OFS)算法可以为解决这个问题提供更好的解决方案。本文通过指出这些技术的优点和缺点,回顾了一些可用的和知名的FS, OFS技术。这项技术工作研究了传统的FS和OFS技术的细节,这些技术依赖于进化计算,有助于从庞大的数据集中获得特征子集。因此,这篇综述也提供了一个总结,并分析了机器学习算法的大数据集。此外,新的机器学习策略和方法解释了他们处理不同挑战的能力,最终目标是帮助从业者为他们的用例选择合适的解决方案。本综述工作呈现了大数据领域的观点,发现了研究的差距和可能性,为更多使用大数据的机器学习领域的研究提供了坚实的基础和帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Feature Selection, Online Feature Selection Techniques for Big Data Classification: - A Review
In the recent times, several disciplines have to tackle with huge datasets, which are involved with a huge number of additional features. Feature Selection (FS) techniques target at reducing the noisy, redundant, or unnecessary features, which might degrade the performance of classification. Although there is several numbers of FS techniques, still it remains an active research field among the data mining, machine learning and pattern recognition groups. Several FS techniques are imposed with critical issues with regards to efficiency and usefulness, due to rise in data dimensionality, which happens nowadays. Nonetheless, conventional techniques are deficit of sufficient scalability to deal with datasets consisting of millions of instances and obtain results with success in a less amount of time. Therefore, in this case, an Online Feature Selection (OFS) algorithm can yield a better solution for solving this issue. This work reviews few of the available and well-known FS, OFS techniques by pointing out the pros and cons of those techniques. This technical work studies the details of traditional FS and OFS techniques depending on evolutionary computation that is helpful in getting the subsets of features from huge datasets. As a result, this review also provides a summary, and analysis of machine learning algorithms for huge datasets. In addition, the new machine learning strategies and methodologies are explained with their capacity of dealing with the different challenges with the ultimate goal of assisting the practitioners in selecting the suitable solutions for their use cases. This review work renders a view on the big data domain, finds the research gaps and possibilities, and offers a solid foundation, assistance for more research in the machine learning field that uses big dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信