LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications

2010 IEEE Second International Conference on Cloud Computing Technology and Science Pub Date : 2010-11-30 DOI:10.1109/CloudCom.2010.45

Zacharia Fadika, M. Govindaraju

{"title":"LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications","authors":"Zacharia Fadika, M. Govindaraju","doi":"10.1109/CloudCom.2010.45","DOIUrl":null,"url":null,"abstract":"Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! and Face book for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and disk based systems, and can also be brought to bear in processing small but CPU intensive distributed applications. In this work, we focus both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. In this paper, we present LEMO-MR (Low overhead, Elastic, configurable for in-Memory applications, and on-Demand fault tolerance), an optimized implementation of MapReduce, for both on-disk and in-memory applications, describe its architecture and identify not only the necessary components of this model, but also trade offs and factors to be considered. We show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. Finally, we quantify the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment.","PeriodicalId":130987,"journal":{"name":"2010 IEEE Second International Conference on Cloud Computing Technology and Science","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Second International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2010.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

Abstract

Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! and Face book for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and disk based systems, and can also be brought to bear in processing small but CPU intensive distributed applications. In this work, we focus both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. In this paper, we present LEMO-MR (Low overhead, Elastic, configurable for in-Memory applications, and on-Demand fault tolerance), an optimized implementation of MapReduce, for both on-disk and in-memory applications, describe its architecture and identify not only the necessary components of this model, but also trade offs and factors to be considered. We show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. Finally, we quantify the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment.

查看原文本刊更多论文

LEMO-MR:低开销和弹性MapReduce实现，针对内存和cpu密集型应用进行了优化

从一开始，MapReduce就经常与Hadoop和大型数据集联系在一起。它在亚马逊云平台上的部署，以及在雅虎的应用。以及用于大规模分布式文档索引和数据库构建的facebook等任务，将MapReduce推到了数据处理应用领域的前沿。然而，该范式的适用性远远超出了它在数据密集型应用程序和基于磁盘的系统中的使用，并且还可以用于处理小型但CPU密集型的分布式应用程序。在这项工作中，我们既关注分布式科学应用程序中处理大规模分层数据的性能，也关注主要用于无磁盘和内存驻留I/O系统的较小但要求较高的输入大小的处理。在本文中，我们提出了MapReduce的优化实现LEMO-MR(低开销，弹性，可配置的内存应用，以及按需容错)，用于磁盘和内存应用，描述了它的架构，不仅确定了该模型的必要组件，而且还确定了需要考虑的权衡和因素。我们通过对云应用程序所使用的代表性数据集的潜在加速来展示我们的实现的有效性。最后，我们在计算密集型环境中量化了MapReduce在Apache Hadoop上实现的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE Second International Conference on Cloud Computing Technology and Science

自引率

0.00%

发文量