Survey of pre-processing techniques for mining big data

Jayaram Hariharakrishnan, S. Mohanavalli, Srividya, K. Kumar
{"title":"Survey of pre-processing techniques for mining big data","authors":"Jayaram Hariharakrishnan, S. Mohanavalli, Srividya, K. Kumar","doi":"10.1109/ICCCSP.2017.7944072","DOIUrl":null,"url":null,"abstract":"Big Data analytics has become important as many administrations, organizations, and companies both public and private have been collecting and analyzing huge amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. With more and more data being generated the ever dynamic size, scale, diversity, and complexity has made the requirement for newer architectures, techniques, algorithms, and analytics to manage it and extract value from the data collected. The progress and innovation is no longer hindered by the ability to collect data but, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion as well as a credible clean and noise free data sets. This paper mainly makes an attempt to understand the different problems to solve in the processes of data preprocessing, to also familiarize with the problems related to cleaning data, know the problems to apply data cleaning and noise removal techniques for big data analytics and to mitigate the imperfect data, together with some techniques to solve them and also to identify the shortcomings in the existing methods of the reduction techniques in the necessary respective areas of application and also to identify the current big data preprocessing proposal's effectiveness to various data sets.","PeriodicalId":269595,"journal":{"name":"2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCSP.2017.7944072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

Big Data analytics has become important as many administrations, organizations, and companies both public and private have been collecting and analyzing huge amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. With more and more data being generated the ever dynamic size, scale, diversity, and complexity has made the requirement for newer architectures, techniques, algorithms, and analytics to manage it and extract value from the data collected. The progress and innovation is no longer hindered by the ability to collect data but, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion as well as a credible clean and noise free data sets. This paper mainly makes an attempt to understand the different problems to solve in the processes of data preprocessing, to also familiarize with the problems related to cleaning data, know the problems to apply data cleaning and noise removal techniques for big data analytics and to mitigate the imperfect data, together with some techniques to solve them and also to identify the shortcomings in the existing methods of the reduction techniques in the necessary respective areas of application and also to identify the current big data preprocessing proposal's effectiveness to various data sets.
挖掘大数据的预处理技术综述
随着许多公共和私营部门、组织和公司收集和分析大量特定领域的信息,大数据分析变得越来越重要,这些信息可能包含有关国家情报、网络安全、欺诈检测、营销和医疗信息等问题的有用信息。随着越来越多的数据被生成,其动态大小、规模、多样性和复杂性要求对新的体系结构、技术、算法和分析进行管理,并从收集的数据中提取价值。进步和创新不再受到收集数据能力的阻碍,而是受到及时管理、分析、总结、可视化和从收集的数据中发现知识的能力的阻碍,并以可扩展的方式以及可靠、干净和无噪声的数据集。本文主要试图了解数据预处理过程中需要解决的不同问题,熟悉数据清洗的相关问题,了解将数据清洗和去噪技术应用于大数据分析以及缓解不完美数据的问题。结合一些技术来解决这些问题,并找出现有约简技术在各自必要的应用领域中的不足,同时也找出当前大数据预处理方案对各种数据集的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信