Comparative Study on Swarm Search Feature Selection for Big Data Stream Mining

S. Meera, B. Jeetha
{"title":"Comparative Study on Swarm Search Feature Selection for Big Data Stream Mining","authors":"S. Meera, B. Jeetha","doi":"10.36039/AA012017003","DOIUrl":null,"url":null,"abstract":"In the modern world there is huge development in the field of networking technology which handles huge data at a time. This data can be structured, semi structured or unstructured. To perform efficient mining of valuable information from such type of data the big data technology is gaining importance nowadays. Data mining application is been used in public and private sectors of industry because of its advantage over conventional networking technology to analyze large real time data. Data mining mainly relies on 3 V’s namely, Volume, Varity and Velocity of processing data. Volume refers to the huge amount of data it collects, Velocity refers to the speed at which it process the data and Variety defines that multi-dimensional data which can be numbers, dates, strings, geospatial data, 3D data, audio files, video files, social files, etc. These data which is stored in big data will be from different source at different rate and of different type; hence it will not be synchronized. This is one of the biggest challenges in working with big data. Second challenge is related to mining the valuable and relevant information from such data adhering to 3rd V i.e. Velocity. Speed is highly important as it is associated with cost of processing. On the other hand, mining through the high dimensional data the search space from which an optimal feature subset is determined and it is enhanced in size, guiding to a difficult stipulate in computation. With respect to handle the troubles, the research work is generally based on the high-dimensionality and streaming structure of data feeds in big data, a new inconsequential feature selection methodology that can be used to identify the feature selection methods in the big data. Some of the research work illustrates the different kinds of optimization methods for data stream mining would lead to tremendous changes in big data. This research work is focused on discussing various research methods that focus on finding the efficient feature selection methods which is used to avoid main challenges and produce optimal solutions. The previous methods are described with their advantages and disadvantages, consequently that the additional research works can be focused more. The tentative experiments were on the entire research works in Mat lab simulation surroundings and it is differentiated with everyone to identify the good methodologies beneath the different performance measures.","PeriodicalId":360729,"journal":{"name":"Automation and Autonomous System","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation and Autonomous System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36039/AA012017003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the modern world there is huge development in the field of networking technology which handles huge data at a time. This data can be structured, semi structured or unstructured. To perform efficient mining of valuable information from such type of data the big data technology is gaining importance nowadays. Data mining application is been used in public and private sectors of industry because of its advantage over conventional networking technology to analyze large real time data. Data mining mainly relies on 3 V’s namely, Volume, Varity and Velocity of processing data. Volume refers to the huge amount of data it collects, Velocity refers to the speed at which it process the data and Variety defines that multi-dimensional data which can be numbers, dates, strings, geospatial data, 3D data, audio files, video files, social files, etc. These data which is stored in big data will be from different source at different rate and of different type; hence it will not be synchronized. This is one of the biggest challenges in working with big data. Second challenge is related to mining the valuable and relevant information from such data adhering to 3rd V i.e. Velocity. Speed is highly important as it is associated with cost of processing. On the other hand, mining through the high dimensional data the search space from which an optimal feature subset is determined and it is enhanced in size, guiding to a difficult stipulate in computation. With respect to handle the troubles, the research work is generally based on the high-dimensionality and streaming structure of data feeds in big data, a new inconsequential feature selection methodology that can be used to identify the feature selection methods in the big data. Some of the research work illustrates the different kinds of optimization methods for data stream mining would lead to tremendous changes in big data. This research work is focused on discussing various research methods that focus on finding the efficient feature selection methods which is used to avoid main challenges and produce optimal solutions. The previous methods are described with their advantages and disadvantages, consequently that the additional research works can be focused more. The tentative experiments were on the entire research works in Mat lab simulation surroundings and it is differentiated with everyone to identify the good methodologies beneath the different performance measures.
面向大数据流挖掘的群搜索特征选择比较研究
在现代世界中,网络技术在一次处理大量数据的领域有了巨大的发展。这些数据可以是结构化的、半结构化的或非结构化的。为了从这类数据中高效地挖掘有价值的信息,大数据技术在当今越来越重要。数据挖掘技术在分析大型实时数据方面具有传统网络技术无法比拟的优势,因此在公共和私营工业部门得到了广泛的应用。数据挖掘主要依赖于3v,即处理数据的量(Volume)、量(variety)和速度(Velocity)。Volume指的是它收集的大量数据,Velocity指的是它处理数据的速度,Variety指的是多维数据,可以是数字、日期、字符串、地理空间数据、3D数据、音频文件、视频文件、社交文件等。这些存储在大数据中的数据将来自不同的来源、不同的速率和不同的类型;因此它不会被同步。这是处理大数据的最大挑战之一。第二个挑战与从这些数据中挖掘有价值的相关信息有关,这些信息遵循第三个V,即速度。速度非常重要,因为它与处理成本相关。另一方面,通过高维数据挖掘搜索空间,从中确定最优特征子集,并将其规模扩大,导致计算困难。在故障处理方面,研究工作一般是基于大数据中数据馈送的高维性和流结构,一种新的无关紧要的特征选择方法,可以用来识别大数据中的特征选择方法。一些研究工作表明,数据流挖掘的各种优化方法将导致大数据的巨大变化。本研究工作的重点是讨论各种研究方法,重点是寻找有效的特征选择方法,用于避免主要挑战并产生最优解。介绍了上述方法的优缺点,从而使后续的研究工作更加有针对性。试探性实验是在Mat实验室模拟环境下对整个研究工作进行的,在不同的性能指标下识别出好的方法是与大家区分的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信