{"title":"Performance evaluation of HDFS in big data management","authors":"Dipayan Dev, Ripon Patgiri","doi":"10.1109/ICHPCA.2014.7045330","DOIUrl":null,"url":null,"abstract":"Size of the data used in today's enterprises has been growing at exponential rates from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. To handle and for analysis of large datasets, an open-source implementation of Apache framework, Hadoop is used now-a-days. For managing and storing of all the resources across its cluster, Hadoop possesses a distributed file system called Hadoop Distributed File System(HDFS).HDFS is written completely in Java and is depicted in such a way that in can store Big Data more reliably, and can stream those at high processing time to the user applications. In recent days, Hadoop is used widely by popular organizations like Yahoo, Facebook and various online shopping market venders. On the other hand, Experiments on Data- Intensive computations are going on to parallelize the processing of data. None of them could actually achieve a desirable performance. Hadoop, with its Map-Reduce parallel data processing capability can achieve these goals efficiently [1]. This paper initially provides an overview of the HDFS in details. Later on, the paper reports the experimental work of Hadoop with the big data and suggests the various factors that affects the Hadoop cluster performance. Paper concludes with providing the different real field challenges of Hadoop in recent days and scope for future work.","PeriodicalId":197528,"journal":{"name":"2014 International Conference on High Performance Computing and Applications (ICHPCA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing and Applications (ICHPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHPCA.2014.7045330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Size of the data used in today's enterprises has been growing at exponential rates from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. To handle and for analysis of large datasets, an open-source implementation of Apache framework, Hadoop is used now-a-days. For managing and storing of all the resources across its cluster, Hadoop possesses a distributed file system called Hadoop Distributed File System(HDFS).HDFS is written completely in Java and is depicted in such a way that in can store Big Data more reliably, and can stream those at high processing time to the user applications. In recent days, Hadoop is used widely by popular organizations like Yahoo, Facebook and various online shopping market venders. On the other hand, Experiments on Data- Intensive computations are going on to parallelize the processing of data. None of them could actually achieve a desirable performance. Hadoop, with its Map-Reduce parallel data processing capability can achieve these goals efficiently [1]. This paper initially provides an overview of the HDFS in details. Later on, the paper reports the experimental work of Hadoop with the big data and suggests the various factors that affects the Hadoop cluster performance. Paper concludes with providing the different real field challenges of Hadoop in recent days and scope for future work.