Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications

Engin Arslan, Mrigank Shekhar, T. Kosar
{"title":"Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications","authors":"Engin Arslan, Mrigank Shekhar, T. Kosar","doi":"10.1109/DataCloud.2014.10","DOIUrl":null,"url":null,"abstract":"MapReduce is one of the leading programming frameworks to implement data-intensive applications by splitting the map and reduce tasks to distributed servers. Although there has been substantial amount of work on map task scheduling and optimization in the literature, the work on reduce task scheduling is very limited. Effective scheduling of the reduce tasks to the resources becomes especially important for the performance of data-intensive applications where large amounts of data are moved between the map and reduce tasks. In this paper, we propose a new algorithm (LoNARS) for reduce task scheduling, which takes both data locality and network traffic into consideration. Data locality awareness aims to schedule the reduce tasks closer to the map tasks to decrease the delay in data access as well as the amount of traffic pushed to the network. Network traffic awareness intends to distribute the traffic over the whole network and minimize the hotspots to reduce the effect of network congestion in data transfers. We have integrated LoNARS into Hadoop-1.2.1. Using our LoNARS algorithm, we achieved up to 15% gain in data shuffling time and up to 3-4% improvement in total job completion time compared to the other reduce task scheduling algorithms. Moreover, we reduced the amount of traffic on network switches by 15% which helps to save energy consumption considerably.","PeriodicalId":121831,"journal":{"name":"2014 5th International Workshop on Data-Intensive Computing in the Clouds","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 5th International Workshop on Data-Intensive Computing in the Clouds","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DataCloud.2014.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

MapReduce is one of the leading programming frameworks to implement data-intensive applications by splitting the map and reduce tasks to distributed servers. Although there has been substantial amount of work on map task scheduling and optimization in the literature, the work on reduce task scheduling is very limited. Effective scheduling of the reduce tasks to the resources becomes especially important for the performance of data-intensive applications where large amounts of data are moved between the map and reduce tasks. In this paper, we propose a new algorithm (LoNARS) for reduce task scheduling, which takes both data locality and network traffic into consideration. Data locality awareness aims to schedule the reduce tasks closer to the map tasks to decrease the delay in data access as well as the amount of traffic pushed to the network. Network traffic awareness intends to distribute the traffic over the whole network and minimize the hotspots to reduce the effect of network congestion in data transfers. We have integrated LoNARS into Hadoop-1.2.1. Using our LoNARS algorithm, we achieved up to 15% gain in data shuffling time and up to 3-4% improvement in total job completion time compared to the other reduce task scheduling algorithms. Moreover, we reduced the amount of traffic on network switches by 15% which helps to save energy consumption considerably.
局部性和网络感知减少数据密集型应用程序的任务调度
MapReduce是一个领先的编程框架,通过将map和reduce任务拆分到分布式服务器来实现数据密集型应用程序。虽然文献中已经有大量关于地图任务调度和优化的工作,但关于减少任务调度的工作却非常有限。对于在map和reduce任务之间移动大量数据的数据密集型应用程序来说,有效地调度reduce任务到资源变得尤为重要。本文提出了一种同时考虑数据局域性和网络流量的任务调度算法(LoNARS)。数据位置感知的目的是将reduce任务调度到离map任务更近的位置,以减少数据访问的延迟和推入网络的流量。网络流量感知的目的是将流量分散到整个网络上,尽量减少热点,以减少网络拥塞对数据传输的影响。我们已经将LoNARS集成到Hadoop-1.2.1中。使用我们的LoNARS算法,与其他减少任务调度算法相比,我们在数据变换时间上增加了15%,在总作业完成时间上提高了3-4%。此外,我们将网络交换机上的流量减少了15%,这有助于大大节省能源消耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信