Hadoop Mapreduce作业调度研究综述

Khushboo Kalia, N. Gupta
{"title":"Hadoop Mapreduce作业调度研究综述","authors":"Khushboo Kalia, N. Gupta","doi":"10.1109/ICNGCIS.2017.40","DOIUrl":null,"url":null,"abstract":"Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.","PeriodicalId":314733,"journal":{"name":"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Review on Job Scheduling for Hadoop Mapreduce\",\"authors\":\"Khushboo Kalia, N. Gupta\",\"doi\":\"10.1109/ICNGCIS.2017.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.\",\"PeriodicalId\":314733,\"journal\":{\"name\":\"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNGCIS.2017.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNGCIS.2017.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

Hadoop是一种基于java的分布式计算环境,它不仅存储而且处理海量数据。它的HDFS (Hadoop分布式文件系统)用于存储数据,分析由MapReduce完成。MapReduce是一种新兴的范例,用于使用无共享集群处理庞大的数据集。许多组织已经将MapReduce用于他们的分析工作。为了提高共享集群的性能和利用率,不同的作者提出了多种调度机制。在MapReduce作业调度过程中会遇到许多问题,如局部性、同步开销和公平性。本文通过对局部性、同步性和公平性等调度问题的介绍,综述了处理这些问题的各种方法。此外,本文还讨论了各种调度算法的评估和解决同步方法(如异步处理和推测执行)期间的开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Review on Job Scheduling for Hadoop Mapreduce
Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信