Jayaram Hariharakrishnan, S. Mohanavalli, Srividya, K. Kumar
{"title":"Survey of pre-processing techniques for mining big data","authors":"Jayaram Hariharakrishnan, S. Mohanavalli, Srividya, K. Kumar","doi":"10.1109/ICCCSP.2017.7944072","DOIUrl":null,"url":null,"abstract":"Big Data analytics has become important as many administrations, organizations, and companies both public and private have been collecting and analyzing huge amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. With more and more data being generated the ever dynamic size, scale, diversity, and complexity has made the requirement for newer architectures, techniques, algorithms, and analytics to manage it and extract value from the data collected. The progress and innovation is no longer hindered by the ability to collect data but, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion as well as a credible clean and noise free data sets. This paper mainly makes an attempt to understand the different problems to solve in the processes of data preprocessing, to also familiarize with the problems related to cleaning data, know the problems to apply data cleaning and noise removal techniques for big data analytics and to mitigate the imperfect data, together with some techniques to solve them and also to identify the shortcomings in the existing methods of the reduction techniques in the necessary respective areas of application and also to identify the current big data preprocessing proposal's effectiveness to various data sets.","PeriodicalId":269595,"journal":{"name":"2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer, Communication and Signal Processing (ICCCSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCSP.2017.7944072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Big Data analytics has become important as many administrations, organizations, and companies both public and private have been collecting and analyzing huge amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. With more and more data being generated the ever dynamic size, scale, diversity, and complexity has made the requirement for newer architectures, techniques, algorithms, and analytics to manage it and extract value from the data collected. The progress and innovation is no longer hindered by the ability to collect data but, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion as well as a credible clean and noise free data sets. This paper mainly makes an attempt to understand the different problems to solve in the processes of data preprocessing, to also familiarize with the problems related to cleaning data, know the problems to apply data cleaning and noise removal techniques for big data analytics and to mitigate the imperfect data, together with some techniques to solve them and also to identify the shortcomings in the existing methods of the reduction techniques in the necessary respective areas of application and also to identify the current big data preprocessing proposal's effectiveness to various data sets.