Performance evaluation of HDFS in big data management

2014 International Conference on High Performance Computing and Applications (ICHPCA) Pub Date : 2014-12-01 DOI:10.1109/ICHPCA.2014.7045330

Dipayan Dev, Ripon Patgiri

{"title":"Performance evaluation of HDFS in big data management","authors":"Dipayan Dev, Ripon Patgiri","doi":"10.1109/ICHPCA.2014.7045330","DOIUrl":null,"url":null,"abstract":"Size of the data used in today's enterprises has been growing at exponential rates from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. To handle and for analysis of large datasets, an open-source implementation of Apache framework, Hadoop is used now-a-days. For managing and storing of all the resources across its cluster, Hadoop possesses a distributed file system called Hadoop Distributed File System(HDFS).HDFS is written completely in Java and is depicted in such a way that in can store Big Data more reliably, and can stream those at high processing time to the user applications. In recent days, Hadoop is used widely by popular organizations like Yahoo, Facebook and various online shopping market venders. On the other hand, Experiments on Data- Intensive computations are going on to parallelize the processing of data. None of them could actually achieve a desirable performance. Hadoop, with its Map-Reduce parallel data processing capability can achieve these goals efficiently [1]. This paper initially provides an overview of the HDFS in details. Later on, the paper reports the experimental work of Hadoop with the big data and suggests the various factors that affects the Hadoop cluster performance. Paper concludes with providing the different real field challenges of Hadoop in recent days and scope for future work.","PeriodicalId":197528,"journal":{"name":"2014 International Conference on High Performance Computing and Applications (ICHPCA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing and Applications (ICHPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHPCA.2014.7045330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Size of the data used in today's enterprises has been growing at exponential rates from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. To handle and for analysis of large datasets, an open-source implementation of Apache framework, Hadoop is used now-a-days. For managing and storing of all the resources across its cluster, Hadoop possesses a distributed file system called Hadoop Distributed File System(HDFS).HDFS is written completely in Java and is depicted in such a way that in can store Big Data more reliably, and can stream those at high processing time to the user applications. In recent days, Hadoop is used widely by popular organizations like Yahoo, Facebook and various online shopping market venders. On the other hand, Experiments on Data- Intensive computations are going on to parallelize the processing of data. None of them could actually achieve a desirable performance. Hadoop, with its Map-Reduce parallel data processing capability can achieve these goals efficiently [1]. This paper initially provides an overview of the HDFS in details. Later on, the paper reports the experimental work of Hadoop with the big data and suggests the various factors that affects the Hadoop cluster performance. Paper concludes with providing the different real field challenges of Hadoop in recent days and scope for future work.

查看原文本刊更多论文

HDFS在大数据管理中的性能评估

从过去几年开始，当今企业使用的数据规模一直在以指数速度增长。同时，处理和分析大量数据的需求也在增加。为了处理和分析大型数据集，Apache框架的开源实现，现在使用Hadoop。为了管理和存储集群中的所有资源，Hadoop拥有一个分布式文件系统，称为Hadoop分布式文件系统(HDFS)。HDFS完全是用Java编写的，这样的描述方式可以更可靠地存储大数据，并且可以将高处理时间的数据流式传输到用户应用程序。最近，Hadoop被雅虎、Facebook和各种在线购物市场供应商等流行组织广泛使用。另一方面，数据密集计算的实验正在进行，以并行化处理数据。它们都不能达到理想的性能。Hadoop凭借其Map-Reduce并行数据处理能力可以有效地实现这些目标。本文首先详细介绍了HDFS的概况。随后，论文报告了Hadoop在大数据下的实验工作，并提出了影响Hadoop集群性能的各种因素。论文最后提供了Hadoop最近在不同的实际领域面临的挑战，以及未来工作的范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 International Conference on High Performance Computing and Applications (ICHPCA)

自引率

0.00%

发文量