ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

2012 IEEE 32nd International Conference on Distributed Computing Systems Pub Date : 2012-06-18 DOI:10.1109/ICDCS.2012.48

Hui Jin, Xi Yang, Xian-He Sun, I. Raicu

{"title":"ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing","authors":"Hui Jin, Xi Yang, Xian-He Sun, I. Raicu","doi":"10.1109/ICDCS.2012.48","DOIUrl":null,"url":null,"abstract":"The MapReduce programming paradigm is gaining more and more popularity recently due to its merits of ease of programming, data distribution and fault tolerance. The low barrier of adoption of MapReduce makes it a promising framework for non-dedicated distributed computing environments. However, the variability of hosts resources and availability could substantially degrade the performance of MapReduce applications. The replication-based fault tolerance mechanism helps to alleviate some problems at the cost of inefficient storage space utilization. Intelligent solutions that guarantee the performance of MapReduce applications with low data replication degree are needed to promote the idea of running MapReduce applications in non-dedicated environment at lower costs. In this research, we propose an Availability-aware Data Placement (ADAPT) strategy to improve the application performance without extra storage cost. The basic idea of ADAPT is to dispatch data based on the availability of each node, reduce network traffic, improve data locality, and optimize the application performance. We implement the prototype of ADAPT within the Hadoop framework, an open-source implementation of MapReduce. The performance of ADAPT is evaluated in an emulated non-dedicated distributed environment. The experimental results show that ADAPT can improve the performance by more than 30%. ADAPT achieves high reliability without the need for additional data replication. ADAPT has also been evaluated for large-scale computing environment through simulations, with promising results.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"20 1","pages":"516-525"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 32nd International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2012.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 75

Abstract

The MapReduce programming paradigm is gaining more and more popularity recently due to its merits of ease of programming, data distribution and fault tolerance. The low barrier of adoption of MapReduce makes it a promising framework for non-dedicated distributed computing environments. However, the variability of hosts resources and availability could substantially degrade the performance of MapReduce applications. The replication-based fault tolerance mechanism helps to alleviate some problems at the cost of inefficient storage space utilization. Intelligent solutions that guarantee the performance of MapReduce applications with low data replication degree are needed to promote the idea of running MapReduce applications in non-dedicated environment at lower costs. In this research, we propose an Availability-aware Data Placement (ADAPT) strategy to improve the application performance without extra storage cost. The basic idea of ADAPT is to dispatch data based on the availability of each node, reduce network traffic, improve data locality, and optimize the application performance. We implement the prototype of ADAPT within the Hadoop framework, an open-source implementation of MapReduce. The performance of ADAPT is evaluated in an emulated non-dedicated distributed environment. The experimental results show that ADAPT can improve the performance by more than 30%. ADAPT achieves high reliability without the need for additional data replication. ADAPT has also been evaluated for large-scale computing environment through simulations, with promising results.

查看原文本刊更多论文

ADAPT:非专用分布式计算的可用性感知MapReduce数据放置

MapReduce编程范式由于其易于编程、数据分布和容错等优点，近年来越来越受欢迎。采用MapReduce的低门槛使它成为非专用分布式计算环境的一个很有前途的框架。然而，主机资源和可用性的可变性会大大降低MapReduce应用程序的性能。基于复制的容错机制有助于缓解一些问题，但代价是存储空间利用率低。为了推广低成本、非专用环境下运行MapReduce的理念，需要智能的解决方案来保证低数据复制度的MapReduce应用的性能。在本研究中，我们提出了一种可用性感知数据放置(ADAPT)策略来提高应用程序的性能，而无需额外的存储成本。ADAPT的基本思想是根据每个节点的可用性调度数据，减少网络流量，提高数据局部性，优化应用程序性能。我们在Hadoop框架内实现了ADAPT的原型，Hadoop是MapReduce的一个开源实现。在模拟的非专用分布式环境中对ADAPT的性能进行了评估。实验结果表明，该方法可使性能提高30%以上。ADAPT无需额外的数据复制即可实现高可靠性。通过模拟对ADAPT在大规模计算环境中的应用进行了评估，取得了令人满意的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 32nd International Conference on Distributed Computing Systems

自引率

0.00%

发文量