Hosein Mohammadi Makrani, Shahab Tabatabaei, S. Rafatirad, H. Homayoun
{"title":"Understanding the role of memory subsystem on performance and energy-efficiency of Hadoop applications","authors":"Hosein Mohammadi Makrani, Shahab Tabatabaei, S. Rafatirad, H. Homayoun","doi":"10.1109/IGCC.2017.8323591","DOIUrl":null,"url":null,"abstract":"The memory subsystem has always been one of the performance bottlenecks in computer systems. Given the large size of data, therefore, the questions of whether Big Data requires big memory and whether main memory subsystem plays an intrinsic role in the performance and energy-efficiency of Big Data are becoming important. In this paper, through a comprehensive real-system experimental analysis of performance, power and resource utilization, we have evaluated main memory characteristic of Hadoop MapReduce, a de facto standard for big data analytics. Through a methodical experimental setup we have analyzed the impact of DRAM capacity, operating frequency, and the number of channels on power and performance to understand the main memory requirements of this important Big Data framework. The characterization results across various Hadoop MapReduce applications from different domains illustrate that Hadoop MapReduce workloads show two distinct behaviors of being either CPU-intensive or Disk-intensive. Our experimental results showed that DRAM frequency as well as number of channels do not play a significant role on the performance of Hadoop workloads. On the other hand, our results indicate that increasing the number of DRAM channels reduces DRAM power and improves the energy-efficiency of Hadoop MapReduce applications.","PeriodicalId":133239,"journal":{"name":"2017 Eighth International Green and Sustainable Computing Conference (IGSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Eighth International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGCC.2017.8323591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
The memory subsystem has always been one of the performance bottlenecks in computer systems. Given the large size of data, therefore, the questions of whether Big Data requires big memory and whether main memory subsystem plays an intrinsic role in the performance and energy-efficiency of Big Data are becoming important. In this paper, through a comprehensive real-system experimental analysis of performance, power and resource utilization, we have evaluated main memory characteristic of Hadoop MapReduce, a de facto standard for big data analytics. Through a methodical experimental setup we have analyzed the impact of DRAM capacity, operating frequency, and the number of channels on power and performance to understand the main memory requirements of this important Big Data framework. The characterization results across various Hadoop MapReduce applications from different domains illustrate that Hadoop MapReduce workloads show two distinct behaviors of being either CPU-intensive or Disk-intensive. Our experimental results showed that DRAM frequency as well as number of channels do not play a significant role on the performance of Hadoop workloads. On the other hand, our results indicate that increasing the number of DRAM channels reduces DRAM power and improves the energy-efficiency of Hadoop MapReduce applications.