Understanding the role of memory subsystem on performance and energy-efficiency of Hadoop applications

2017 Eighth International Green and Sustainable Computing Conference (IGSC) Pub Date : 2017-10-01 DOI:10.1109/IGCC.2017.8323591

Hosein Mohammadi Makrani, Shahab Tabatabaei, S. Rafatirad, H. Homayoun

{"title":"Understanding the role of memory subsystem on performance and energy-efficiency of Hadoop applications","authors":"Hosein Mohammadi Makrani, Shahab Tabatabaei, S. Rafatirad, H. Homayoun","doi":"10.1109/IGCC.2017.8323591","DOIUrl":null,"url":null,"abstract":"The memory subsystem has always been one of the performance bottlenecks in computer systems. Given the large size of data, therefore, the questions of whether Big Data requires big memory and whether main memory subsystem plays an intrinsic role in the performance and energy-efficiency of Big Data are becoming important. In this paper, through a comprehensive real-system experimental analysis of performance, power and resource utilization, we have evaluated main memory characteristic of Hadoop MapReduce, a de facto standard for big data analytics. Through a methodical experimental setup we have analyzed the impact of DRAM capacity, operating frequency, and the number of channels on power and performance to understand the main memory requirements of this important Big Data framework. The characterization results across various Hadoop MapReduce applications from different domains illustrate that Hadoop MapReduce workloads show two distinct behaviors of being either CPU-intensive or Disk-intensive. Our experimental results showed that DRAM frequency as well as number of channels do not play a significant role on the performance of Hadoop workloads. On the other hand, our results indicate that increasing the number of DRAM channels reduces DRAM power and improves the energy-efficiency of Hadoop MapReduce applications.","PeriodicalId":133239,"journal":{"name":"2017 Eighth International Green and Sustainable Computing Conference (IGSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Eighth International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGCC.2017.8323591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

The memory subsystem has always been one of the performance bottlenecks in computer systems. Given the large size of data, therefore, the questions of whether Big Data requires big memory and whether main memory subsystem plays an intrinsic role in the performance and energy-efficiency of Big Data are becoming important. In this paper, through a comprehensive real-system experimental analysis of performance, power and resource utilization, we have evaluated main memory characteristic of Hadoop MapReduce, a de facto standard for big data analytics. Through a methodical experimental setup we have analyzed the impact of DRAM capacity, operating frequency, and the number of channels on power and performance to understand the main memory requirements of this important Big Data framework. The characterization results across various Hadoop MapReduce applications from different domains illustrate that Hadoop MapReduce workloads show two distinct behaviors of being either CPU-intensive or Disk-intensive. Our experimental results showed that DRAM frequency as well as number of channels do not play a significant role on the performance of Hadoop workloads. On the other hand, our results indicate that increasing the number of DRAM channels reduces DRAM power and improves the energy-efficiency of Hadoop MapReduce applications.

查看原文本刊更多论文

了解内存子系统在Hadoop应用程序性能和能效方面的作用

内存子系统一直是计算机系统的性能瓶颈之一。因此，在数据量大的情况下，大数据是否需要大内存以及主存子系统对大数据的性能和能效是否具有内在作用的问题变得越来越重要。本文通过对性能、功耗和资源利用率进行全面的实系统实验分析，对大数据分析事实上的标准Hadoop MapReduce的主内存特性进行了评估。通过系统的实验设置，我们分析了DRAM容量、工作频率和通道数量对功率和性能的影响，以了解这个重要的大数据框架的主要内存需求。来自不同领域的各种Hadoop MapReduce应用程序的表征结果表明，Hadoop MapReduce工作负载显示出cpu密集型和磁盘密集型两种不同的行为。我们的实验结果表明，DRAM频率和通道数量对Hadoop工作负载的性能没有显著影响。另一方面，我们的结果表明，增加DRAM通道的数量可以降低DRAM功耗，提高Hadoop MapReduce应用程序的能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 Eighth International Green and Sustainable Computing Conference (IGSC)

自引率

0.00%

发文量