WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters

Shen Li, Shaohan Hu, Shiguang Wang, Lu Su, T. Abdelzaher, Indranil Gupta, Richard Pace
{"title":"WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters","authors":"Shen Li, Shaohan Hu, Shiguang Wang, Lu Su, T. Abdelzaher, Indranil Gupta, Richard Pace","doi":"10.1109/ICDCS.2014.18","DOIUrl":null,"url":null,"abstract":"In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce workflows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10% compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 34th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2014.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67

Abstract

In this paper, we present WOHA, an efficient scheduling framework for deadline-aware Map-Reduce workflows. In data centers, complex backend data analysis often utilizes a workflow that contains tens or even hundreds of interdependent Map-Reduce jobs. Meeting deadlines of these workflows is usually of crucial importance to businesses (for example, workflows tightly linked to time-sensitive advertisement placement optimizations can directly affect revenue). Popular Map-Reduce implementations, such as Hadoop, deal with independent Map-Reduce jobs rather than workflows of jobs. In order to simplify the process of submitting workflows, solutions like Oozie emerge, which take a workflow configuration file as input and automatically submit its Hadoop jobs at the right time. The information separation that Hadoop only handles resource allocation and Oozie workflow topology, although preventing the Hadoop master node from getting involved with complex workflow analysis, may unnecessarily lengthen the workflow spans and thus cause more deadline misses. To address this problem and at the same time honor the efficiency of Hadoop master node, WOHA allows client nodes to locally generate scheduling plans which are later used as resource allocation hints by the master node. Under this framework design, we propose a novel scheduling algorithm that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses. We implement WOHA by extending Hadoop-1.2.1. Our experiments over an 80-server cluster show that WOHA manages to increase the deadline satisfaction ratio by 10% compared to state-of-the-art solutions, and scales up to tens of thousands of concurrently running workflows.
WOHA:基于Hadoop集群的截止日期感知Map-Reduce工作流调度框架
在本文中,我们提出了WOHA,一个有效的调度框架,用于截止日期感知Map-Reduce工作流。在数据中心中,复杂的后端数据分析通常使用包含数十甚至数百个相互依赖的Map-Reduce作业的工作流。满足这些工作流的最后期限通常对业务至关重要(例如,与时间敏感的广告放置优化紧密相关的工作流可以直接影响收入)。流行的Map-Reduce实现,如Hadoop,处理独立的Map-Reduce作业,而不是作业的工作流。为了简化提交工作流的过程,出现了像Oozie这样的解决方案,它将工作流配置文件作为输入,并在正确的时间自动提交其Hadoop作业。Hadoop只处理资源分配和Oozie工作流拓扑的信息分离,虽然可以防止Hadoop主节点参与复杂的工作流分析,但可能不必要地延长工作流跨度,从而导致更多的截止日期错过。为了解决这个问题,同时尊重Hadoop主节点的效率,WOHA允许客户端节点本地生成调度计划,这些计划稍后被主节点用作资源分配提示。在此框架设计下,我们提出了一种新的调度算法,该算法根据工作流的进度动态分配优先级,从而提高工作流的最后期限满意度。我们通过扩展Hadoop-1.2.1来实现WOHA。我们在80台服务器集群上的实验表明,与最先进的解决方案相比,WOHA成功地将截止日期满意度提高了10%,并扩展到数万个并发运行的工作流。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信