Survey of pre-processing techniques for mining big data

2017 International Conference on Computer, Communication and Signal Processing (ICCCSP) Pub Date : 1900-01-01 DOI:10.1109/ICCCSP.2017.7944072

Jayaram Hariharakrishnan, S. Mohanavalli, Srividya, K. Kumar

{"title":"Survey of pre-processing techniques for mining big data","authors":"Jayaram Hariharakrishnan, S. Mohanavalli, Srividya, K. Kumar","doi":"10.1109/ICCCSP.2017.7944072","DOIUrl":null,"url":null,"abstract":"Big Data analytics has become important as many administrations, organizations, and companies both public and private have been collecting and analyzing huge amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. With more and more data being generated the ever dynamic size, scale, diversity, and complexity has made the requirement for newer architectures, techniques, algorithms, and analytics to manage it and extract value from the data collected. The progress and innovation is no longer hindered by the ability to collect data but, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion as well as a credible clean and noise free data sets. This paper mainly makes an attempt to understand the different problems to solve in the processes of data preprocessing, to also familiarize with the problems related to cleaning data, know the problems to apply data cleaning and noise removal techniques for big data analytics and to mitigate the imperfect data, together with some techniques to solve them and also to identify the shortcomings in the existing methods of the reduction techniques in the necessary respective areas of application and also to identify the current big data preprocessing proposal's effectiveness to various data sets.","PeriodicalId":269595,"journal":{"name":"2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCSP.2017.7944072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Big Data analytics has become important as many administrations, organizations, and companies both public and private have been collecting and analyzing huge amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. With more and more data being generated the ever dynamic size, scale, diversity, and complexity has made the requirement for newer architectures, techniques, algorithms, and analytics to manage it and extract value from the data collected. The progress and innovation is no longer hindered by the ability to collect data but, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion as well as a credible clean and noise free data sets. This paper mainly makes an attempt to understand the different problems to solve in the processes of data preprocessing, to also familiarize with the problems related to cleaning data, know the problems to apply data cleaning and noise removal techniques for big data analytics and to mitigate the imperfect data, together with some techniques to solve them and also to identify the shortcomings in the existing methods of the reduction techniques in the necessary respective areas of application and also to identify the current big data preprocessing proposal's effectiveness to various data sets.

查看原文本刊更多论文

挖掘大数据的预处理技术综述

随着许多公共和私营部门、组织和公司收集和分析大量特定领域的信息，大数据分析变得越来越重要，这些信息可能包含有关国家情报、网络安全、欺诈检测、营销和医疗信息等问题的有用信息。随着越来越多的数据被生成，其动态大小、规模、多样性和复杂性要求对新的体系结构、技术、算法和分析进行管理，并从收集的数据中提取价值。进步和创新不再受到收集数据能力的阻碍，而是受到及时管理、分析、总结、可视化和从收集的数据中发现知识的能力的阻碍，并以可扩展的方式以及可靠、干净和无噪声的数据集。本文主要试图了解数据预处理过程中需要解决的不同问题，熟悉数据清洗的相关问题，了解将数据清洗和去噪技术应用于大数据分析以及缓解不完美数据的问题。结合一些技术来解决这些问题，并找出现有约简技术在各自必要的应用领域中的不足，同时也找出当前大数据预处理方案对各种数据集的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)

自引率

0.00%

发文量