MapReduce作业调度的比较综述

2011 IEEE International Conference on Cloud Computing and Intelligence Systems Pub Date : 2011-10-13 DOI:10.1109/CCIS.2011.6045089

Dongjin Yoo, K. Sim

{"title":"MapReduce作业调度的比较综述","authors":"Dongjin Yoo, K. Sim","doi":"10.1109/CCIS.2011.6045089","DOIUrl":null,"url":null,"abstract":"MapReduce is an emerging paradigm for data intensive processing with support of cloud computing technology. MapReduce provides convenient programming interfaces to distribute data intensive works in a cluster environment. The strengths of MapReduce are fault tolerance, an easy programming structure and high scalability. A variety of applications have adopted MapReduce including scientific analysis, web data processing and high performance computing. Data Intensive computing systems, such as Hadoop and Dryad, should provide an efficient scheduling mechanism for enhanced utilization in a shared cluster environment. The problems of scheduling map-reduce jobs are mostly caused by locality and synchronization overhead. Also, there is a need to schedule multiple jobs in a shared cluster with fairness constraints. By introducing the scheduling problems with regards to locality, synchronization and fairness constraints, this paper reviews a collection of scheduling methods for handling these issues in MapReduce. In addition, this paper compares different scheduling methods evaluating their features, strengths and weaknesses. For resolving synchronization overhead, two categories of studies; asynchronous processing and speculative execution are discussed. For fairness constraints with locality improvement, delay scheduling in Hadoop and Quincy scheduler in Dryad are discussed.","PeriodicalId":128504,"journal":{"name":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":"{\"title\":\"A comparative review of job scheduling for MapReduce\",\"authors\":\"Dongjin Yoo, K. Sim\",\"doi\":\"10.1109/CCIS.2011.6045089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce is an emerging paradigm for data intensive processing with support of cloud computing technology. MapReduce provides convenient programming interfaces to distribute data intensive works in a cluster environment. The strengths of MapReduce are fault tolerance, an easy programming structure and high scalability. A variety of applications have adopted MapReduce including scientific analysis, web data processing and high performance computing. Data Intensive computing systems, such as Hadoop and Dryad, should provide an efficient scheduling mechanism for enhanced utilization in a shared cluster environment. The problems of scheduling map-reduce jobs are mostly caused by locality and synchronization overhead. Also, there is a need to schedule multiple jobs in a shared cluster with fairness constraints. By introducing the scheduling problems with regards to locality, synchronization and fairness constraints, this paper reviews a collection of scheduling methods for handling these issues in MapReduce. In addition, this paper compares different scheduling methods evaluating their features, strengths and weaknesses. For resolving synchronization overhead, two categories of studies; asynchronous processing and speculative execution are discussed. For fairness constraints with locality improvement, delay scheduling in Hadoop and Quincy scheduler in Dryad are discussed.\",\"PeriodicalId\":128504,\"journal\":{\"name\":\"2011 IEEE International Conference on Cloud Computing and Intelligence Systems\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"63\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cloud Computing and Intelligence Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCIS.2011.6045089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS.2011.6045089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 63

摘要

MapReduce是一种新兴的数据密集型处理范例，支持云计算技术。MapReduce提供了方便的编程接口，可以在集群环境中分布数据密集型工作。MapReduce的优点是容错性、易于编程的结构和高扩展性。各种应用都采用了MapReduce，包括科学分析、web数据处理和高性能计算。数据密集型计算系统，如Hadoop和Dryad，应该提供有效的调度机制，以提高共享集群环境中的利用率。调度map-reduce作业的问题主要是由局部性和同步开销引起的。此外，还需要在具有公平性约束的共享集群中调度多个作业。通过介绍局部性、同步性和公平性约束的调度问题，综述了MapReduce中处理这些问题的调度方法。此外，本文还对不同的调度方法进行了比较，评价了它们的特点和优缺点。为了解决同步开销，有两类研究;讨论了异步处理和推测执行。针对局部性改进的公平性约束，讨论了Hadoop中的延迟调度和Dryad中的Quincy调度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comparative review of job scheduling for MapReduce

MapReduce is an emerging paradigm for data intensive processing with support of cloud computing technology. MapReduce provides convenient programming interfaces to distribute data intensive works in a cluster environment. The strengths of MapReduce are fault tolerance, an easy programming structure and high scalability. A variety of applications have adopted MapReduce including scientific analysis, web data processing and high performance computing. Data Intensive computing systems, such as Hadoop and Dryad, should provide an efficient scheduling mechanism for enhanced utilization in a shared cluster environment. The problems of scheduling map-reduce jobs are mostly caused by locality and synchronization overhead. Also, there is a need to schedule multiple jobs in a shared cluster with fairness constraints. By introducing the scheduling problems with regards to locality, synchronization and fairness constraints, this paper reviews a collection of scheduling methods for handling these issues in MapReduce. In addition, this paper compares different scheduling methods evaluating their features, strengths and weaknesses. For resolving synchronization overhead, two categories of studies; asynchronous processing and speculative execution are discussed. For fairness constraints with locality improvement, delay scheduling in Hadoop and Quincy scheduler in Dryad are discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE International Conference on Cloud Computing and Intelligence Systems

自引率

0.00%

发文量