R. Khullar, Tushar Sharma, T. Choudhury, R. Mittal
{"title":"Addressing Challenges of Hadoop for BIG Data Analysis","authors":"R. Khullar, Tushar Sharma, T. Choudhury, R. Mittal","doi":"10.1109/IC3IOT.2018.8668136","DOIUrl":null,"url":null,"abstract":"Data has become necessary part of every individual, industry, economy, business function and organization. Miscellaneous industries, machines and institutions are expanding their analytical data at digital world at a very high rate. As this data set increases, selecting the relevant information becomes a laborious task. Therefore, this on-command and on-demand nature of digital universe gives creation of a data category called the Big-Data because of its sheer velocity, volume and variety. It is basically employed to differentiate the various datasets and their sizes are above the ability of the database software tools to manage, evaluate and store. It proposes exclusive computational and analytical challenges which includes measurement errors, scalability and storage bottleneck and noise accumulation.Because of a specific characteristic of the Big-Data they are put in a distributed file system Hadoop (HDFS). However, Hadoop is impartially complex. As Hadoop is new to users, this research paper discusses the important challenges and issues faced during the data mining and deployment of the file system. Aim of this paper is to make user comfortable with Hadoop.","PeriodicalId":155587,"journal":{"name":"2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3IOT.2018.8668136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Data has become necessary part of every individual, industry, economy, business function and organization. Miscellaneous industries, machines and institutions are expanding their analytical data at digital world at a very high rate. As this data set increases, selecting the relevant information becomes a laborious task. Therefore, this on-command and on-demand nature of digital universe gives creation of a data category called the Big-Data because of its sheer velocity, volume and variety. It is basically employed to differentiate the various datasets and their sizes are above the ability of the database software tools to manage, evaluate and store. It proposes exclusive computational and analytical challenges which includes measurement errors, scalability and storage bottleneck and noise accumulation.Because of a specific characteristic of the Big-Data they are put in a distributed file system Hadoop (HDFS). However, Hadoop is impartially complex. As Hadoop is new to users, this research paper discusses the important challenges and issues faced during the data mining and deployment of the file system. Aim of this paper is to make user comfortable with Hadoop.