基于Hadoop框架的性能饱和评估

2018 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2018-07-01 DOI:10.1109/HPCS.2018.00164

R. S. Ferreira, B. Batista, Rafael M. D. Frinhani, B. Kuehne, Dionisio Machado Leite Filho, M. Peixoto

{"title":"基于Hadoop框架的性能饱和评估","authors":"R. S. Ferreira, B. Batista, Rafael M. D. Frinhani, B. Kuehne, Dionisio Machado Leite Filho, M. Peixoto","doi":"10.1109/HPCS.2018.00164","DOIUrl":null,"url":null,"abstract":"It is estimated that about 2.5 exabytes of data are produced daily. This large volume of data has brought new possibilities of applications, however, to manage this large volume of data, new technologies were needed. One of the most prominent technologies is the Hadoop framework, which implements a parallel task processing paradigm. The aim of this paper is to present the results of our group's research which analyzed the performance of the Hadoop framework for Big Data processing. The performance evaluation focused on finding the saturation point of Hadoop performance by varying the number of nodes in the cluster applying two benchmarks - TeraSort and Pi. The analysis was performed using a real infrastructure, implementing the system in a physical cluster, providing a general approach of performance analysis in the Hadoop framework for developers and researchers.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Performance Saturation Using the Hadoop Framework\",\"authors\":\"R. S. Ferreira, B. Batista, Rafael M. D. Frinhani, B. Kuehne, Dionisio Machado Leite Filho, M. Peixoto\",\"doi\":\"10.1109/HPCS.2018.00164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is estimated that about 2.5 exabytes of data are produced daily. This large volume of data has brought new possibilities of applications, however, to manage this large volume of data, new technologies were needed. One of the most prominent technologies is the Hadoop framework, which implements a parallel task processing paradigm. The aim of this paper is to present the results of our group's research which analyzed the performance of the Hadoop framework for Big Data processing. The performance evaluation focused on finding the saturation point of Hadoop performance by varying the number of nodes in the cluster applying two benchmarks - TeraSort and Pi. The analysis was performed using a real infrastructure, implementing the system in a physical cluster, providing a general approach of performance analysis in the Hadoop framework for developers and researchers.\",\"PeriodicalId\":308138,\"journal\":{\"name\":\"2018 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2018.00164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2018.00164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

据估计，每天大约产生2.5艾字节的数据。如此庞大的数据量为应用带来了新的可能性，然而，要管理如此庞大的数据量，需要新的技术。最突出的技术之一是Hadoop框架，它实现了并行任务处理范例。本文的目的是展示我们小组的研究结果，分析Hadoop框架在大数据处理中的性能。性能评估的重点是通过使用两个基准——TeraSort和Pi——来改变集群中的节点数量，从而找到Hadoop性能的饱和点。分析是使用真实的基础设施进行的，在物理集群中实现系统，为开发人员和研究人员提供了Hadoop框架中性能分析的通用方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of Performance Saturation Using the Hadoop Framework

It is estimated that about 2.5 exabytes of data are produced daily. This large volume of data has brought new possibilities of applications, however, to manage this large volume of data, new technologies were needed. One of the most prominent technologies is the Hadoop framework, which implements a parallel task processing paradigm. The aim of this paper is to present the results of our group's research which analyzed the performance of the Hadoop framework for Big Data processing. The performance evaluation focused on finding the saturation point of Hadoop performance by varying the number of nodes in the cluster applying two benchmarks - TeraSort and Pi. The analysis was performed using a real infrastructure, implementing the system in a physical cluster, providing a general approach of performance analysis in the Hadoop framework for developers and researchers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量