DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI:10.1109/CCGrid.2011.71

Zacharia Fadika, M. Govindaraju

{"title":"DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications","authors":"Zacharia Fadika, M. Govindaraju","doi":"10.1109/CCGrid.2011.71","DOIUrl":null,"url":null,"abstract":"Since its introduction, MapReduce implementations have been primarily focused towards static compute cluster sizes. In this paper, we introduce the concept of dynamic elasticity to MapReduce. We present the design decisions and implementation tradeoffs for DELMA, (Dynamically Elastic MapReduce), a framework that follows the MapReduce paradigm, just like Hadoop MapReduce, but that is capable of growing and shrinking its cluster size, as jobs are underway. In our study, we test DELMA in diverse performance scenarios, ranging from diverse node additions to node additions at various points in the application run-time with various dataset sizes. The applicability of the MapReduce paradigm extends far beyond its use with large-scale data intensive applications, and can also be brought to bear in processing long running distributed applications executing on small-sized clusters. In this work, we focus both on the performance of processing hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in small clusters. We run experiments for datasets that require CPU intensive processing, ranging in size from Millions of input data elements to process, up to over half a billion elements, and observe the positive scalability patterns exhibited by the system. We show that for such sizes, performance increases accordingly with data and cluster size increases. We conclude on the benefits of providing MapReduce with the capability of dynamically growing and shrinking its cluster configuration by adding and removing nodes during jobs, and explain the possibilities presented by this model.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2011.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

Since its introduction, MapReduce implementations have been primarily focused towards static compute cluster sizes. In this paper, we introduce the concept of dynamic elasticity to MapReduce. We present the design decisions and implementation tradeoffs for DELMA, (Dynamically Elastic MapReduce), a framework that follows the MapReduce paradigm, just like Hadoop MapReduce, but that is capable of growing and shrinking its cluster size, as jobs are underway. In our study, we test DELMA in diverse performance scenarios, ranging from diverse node additions to node additions at various points in the application run-time with various dataset sizes. The applicability of the MapReduce paradigm extends far beyond its use with large-scale data intensive applications, and can also be brought to bear in processing long running distributed applications executing on small-sized clusters. In this work, we focus both on the performance of processing hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in small clusters. We run experiments for datasets that require CPU intensive processing, ranging in size from Millions of input data elements to process, up to over half a billion elements, and observe the positive scalability patterns exhibited by the system. We show that for such sizes, performance increases accordingly with data and cluster size increases. We conclude on the benefits of providing MapReduce with the capability of dynamically growing and shrinking its cluster configuration by adding and removing nodes during jobs, and explain the possibilities presented by this model.

查看原文本刊更多论文

DELMA: cpu密集型应用的动态弹性MapReduce框架

自引入以来，MapReduce的实现主要集中在静态计算集群大小上。本文将动态弹性的概念引入到MapReduce中。我们提出了DELMA(动态弹性MapReduce)的设计决策和实现权衡，DELMA是一个遵循MapReduce范式的框架，就像Hadoop MapReduce一样，但它能够随着工作的进行而增加和缩小其集群大小。在我们的研究中，我们在不同的性能场景中测试DELMA，从不同的节点添加到应用程序运行时的不同节点添加，并使用不同的数据集大小。MapReduce范例的适用性远远超出了它在大规模数据密集型应用程序中的应用，还可以用于处理在小型集群上执行的长时间运行的分布式应用程序。在这项工作中，我们既关注分布式科学应用中处理分层数据的性能，也关注主要用于小型集群的较小但要求较高的输入大小的处理。我们为需要CPU密集型处理的数据集运行实验，这些数据集的大小从数百万个输入数据元素到处理，最多超过5亿个元素，并观察到系统显示的积极的可伸缩性模式。我们表明，对于这样的大小，性能随着数据和集群大小的增加而相应增加。我们总结了通过在作业期间添加和删除节点，为MapReduce提供动态增长和收缩集群配置的能力的好处，并解释了该模型提供的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量