MapReuse: Reusing Computation in an In-Memory MapReduce System

2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI:10.1109/IPDPS.2014.18

Devesh Tiwari, Yan Solihin

引用次数: 15

Abstract

MapReduce programming model is being increasingly adopted for data intensive high performance computing. Recently, it has been observed that in data-intensive environment, programs are often run multiple times with either identical or slightly-changed input, which creates a significant opportunity for computation reuse. Recognizing the opportunity, researchers have proposed techniques to reuse computation in disk-based MapReduce systems such as Hadoop, but not for in-memory MapReduce (IMMR) systems such as Phoenix. In this paper, we propose a novel technique for computation reuse in IMMR systems, which we refer to as MapReuse. MapReuse detects input similarity by comparing their signatures. It skips re-computing output from a repeated portion of the input, computes output from a new portion of input, and removes output that corresponds to a deleted portion of the input. MapReuse is built on top of an existing IMMR system, leaving it largely unmodified. MapReuse significantly speeds up IMMR, even when the new input differs by 25% compared to the original input.

查看原文本刊更多论文

MapReuse:在内存MapReduce系统中重用计算

MapReduce编程模型越来越多地被用于数据密集型高性能计算。最近，人们观察到，在数据密集型环境中，程序经常使用相同或略有变化的输入运行多次，这为计算重用创造了重要的机会。认识到这一机会，研究人员提出了在基于磁盘的MapReduce系统(如Hadoop)中重用计算的技术，但不适用于内存MapReduce (IMMR)系统(如Phoenix)。在本文中，我们提出了一种新的IMMR系统计算重用技术，我们称之为MapReuse。MapReuse通过比较它们的签名来检测输入的相似性。它跳过从输入的重复部分重新计算输出，从输入的新部分计算输出，并删除与输入的已删除部分对应的输出。MapReuse是建立在现有的IMMR系统之上的，使得它基本上没有被修改。即使新输入与原始输入相差25%，MapReuse也能显著提高IMMR的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 28th International Parallel and Distributed Processing Symposium

自引率

0.00%

发文量