MOMC: Multi-objective and Multi-constrained Scheduling Algorithm of Many Tasks in Hadoop

2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing Pub Date : 2014-11-08 DOI:10.1109/3PGCIC.2014.40

Cristian Voicu, Florin Pop, C. Dobre, F. Xhafa

{"title":"MOMC: Multi-objective and Multi-constrained Scheduling Algorithm of Many Tasks in Hadoop","authors":"Cristian Voicu, Florin Pop, C. Dobre, F. Xhafa","doi":"10.1109/3PGCIC.2014.40","DOIUrl":null,"url":null,"abstract":"Even though scheduling in a distributed system was debated for many years, the platforms and the job types are changing everyday. This is why we need special algorithms based on new applications requirements, especially when a application is deployed in a Cloud environment. One of the most important framework used for large-scale data processing in Clouds is Hadoop and its extensions. Hadoop framework comes with default algorithms like FIFO, Fair Scheduler or Capacity Scheduler, and Hadoop on Demand. These scheduling algorithms are focused on a different and single constraint. It is hard to satisfy multiple constraints and to have a lot of objectives in the same time. After summarizing the most common schedulers, showing the need of each one in the moment it appeared on the market, this paper presents MOMC, a multi-objective and multi-constrained scheduling algorithm of many tasks in Hadoop. MOMC implementation focuses on two objectives: avoiding resource contention and having an optimal workload of the cluster, and two constraints: deadline and budget. To compare the algorithms based on different metrics, we use Scheduling Load Simulator, which is integrated in Hadoop framework and helps the developers to spend less time on testing. As killer application that generate many tasks we have chosen processing task for the Million Song Dataset, which is a set of data contains metadata for one million commercially-available songs.","PeriodicalId":395610,"journal":{"name":"2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3PGCIC.2014.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Even though scheduling in a distributed system was debated for many years, the platforms and the job types are changing everyday. This is why we need special algorithms based on new applications requirements, especially when a application is deployed in a Cloud environment. One of the most important framework used for large-scale data processing in Clouds is Hadoop and its extensions. Hadoop framework comes with default algorithms like FIFO, Fair Scheduler or Capacity Scheduler, and Hadoop on Demand. These scheduling algorithms are focused on a different and single constraint. It is hard to satisfy multiple constraints and to have a lot of objectives in the same time. After summarizing the most common schedulers, showing the need of each one in the moment it appeared on the market, this paper presents MOMC, a multi-objective and multi-constrained scheduling algorithm of many tasks in Hadoop. MOMC implementation focuses on two objectives: avoiding resource contention and having an optimal workload of the cluster, and two constraints: deadline and budget. To compare the algorithms based on different metrics, we use Scheduling Load Simulator, which is integrated in Hadoop framework and helps the developers to spend less time on testing. As killer application that generate many tasks we have chosen processing task for the Million Song Dataset, which is a set of data contains metadata for one million commercially-available songs.

查看原文本刊更多论文

MOMC: Hadoop中多任务多目标多约束调度算法

尽管分布式系统中的调度已经争论了很多年，但平台和作业类型每天都在变化。这就是为什么我们需要基于新应用程序需求的特殊算法，特别是当应用程序部署在云环境中时。用于云中大规模数据处理的最重要框架之一是Hadoop及其扩展。Hadoop框架自带默认算法，如FIFO, Fair Scheduler或Capacity Scheduler，以及Hadoop on Demand。这些调度算法关注的是不同的单一约束。同时满足多个约束条件和多个目标是很困难的。本文在总结了最常见的调度算法的基础上，展示了市场上出现的每一种调度算法的需求，提出了Hadoop中多目标、多约束的多任务调度算法MOMC。MOMC的实现主要关注两个目标:避免资源争用和集群的最佳工作负载，以及两个约束:截止日期和预算。为了比较基于不同指标的算法，我们使用了调度负载模拟器，它集成在Hadoop框架中，可以帮助开发人员花费更少的时间进行测试。作为杀手级应用程序，我们选择了百万歌曲数据集的处理任务，这是一组包含100万首商业歌曲元数据的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

自引率

0.00%

发文量