BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

Jiahui Jin, Junzhou Luo, Aibo Song, Fang Dong, Runqun Xiong
{"title":"BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing","authors":"Jiahui Jin, Junzhou Luo, Aibo Song, Fang Dong, Runqun Xiong","doi":"10.1109/CCGrid.2011.55","DOIUrl":null,"url":null,"abstract":"Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"150","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2011.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 150

Abstract

Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.
云计算中一种高效的数据局部性驱动任务调度算法
近年来,大规模数据处理在云计算系统(如MapReduce、Hadoop和Dryad)中越来越普遍。在这些系统中,文件被分割成许多小块,所有块在多个服务器上复制。为了有效地处理文件,每个作业被分成许多任务,每个任务被分配给一个服务器来处理一个文件块。由于网络带宽在这些系统中是稀缺资源,因此增强任务数据局部性(将任务放置在包含其输入块的服务器上)对于任务完成时间至关重要。虽然目前已有许多改进数据局部性的方法,但大多数方法要么贪心,要么忽略全局优化,要么计算复杂度高。为了解决这些问题,我们提出了一种启发式任务调度算法Balance-Reduce(BAR),该算法首先产生初始任务分配,然后通过调整初始任务分配来逐步减少任务完成时间。通过采用全局视图,BAR可以根据网络状态和集群工作负载动态调整数据位置。仿真结果表明,该算法能够在几秒内处理大量的问题实例,并且在作业完成时间上优于以往的相关算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信