R. S. Ferreira, B. Batista, Rafael M. D. Frinhani, B. Kuehne, Dionisio Machado Leite Filho, M. Peixoto
{"title":"Evaluation of Performance Saturation Using the Hadoop Framework","authors":"R. S. Ferreira, B. Batista, Rafael M. D. Frinhani, B. Kuehne, Dionisio Machado Leite Filho, M. Peixoto","doi":"10.1109/HPCS.2018.00164","DOIUrl":null,"url":null,"abstract":"It is estimated that about 2.5 exabytes of data are produced daily. This large volume of data has brought new possibilities of applications, however, to manage this large volume of data, new technologies were needed. One of the most prominent technologies is the Hadoop framework, which implements a parallel task processing paradigm. The aim of this paper is to present the results of our group's research which analyzed the performance of the Hadoop framework for Big Data processing. The performance evaluation focused on finding the saturation point of Hadoop performance by varying the number of nodes in the cluster applying two benchmarks - TeraSort and Pi. The analysis was performed using a real infrastructure, implementing the system in a physical cluster, providing a general approach of performance analysis in the Hadoop framework for developers and researchers.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2018.00164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
It is estimated that about 2.5 exabytes of data are produced daily. This large volume of data has brought new possibilities of applications, however, to manage this large volume of data, new technologies were needed. One of the most prominent technologies is the Hadoop framework, which implements a parallel task processing paradigm. The aim of this paper is to present the results of our group's research which analyzed the performance of the Hadoop framework for Big Data processing. The performance evaluation focused on finding the saturation point of Hadoop performance by varying the number of nodes in the cluster applying two benchmarks - TeraSort and Pi. The analysis was performed using a real infrastructure, implementing the system in a physical cluster, providing a general approach of performance analysis in the Hadoop framework for developers and researchers.