Hadoop Mapreduce作业调度研究综述

2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS) Pub Date : 2017-12-01 DOI:10.1109/ICNGCIS.2017.40

Khushboo Kalia, N. Gupta

{"title":"Hadoop Mapreduce作业调度研究综述","authors":"Khushboo Kalia, N. Gupta","doi":"10.1109/ICNGCIS.2017.40","DOIUrl":null,"url":null,"abstract":"Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.","PeriodicalId":314733,"journal":{"name":"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Review on Job Scheduling for Hadoop Mapreduce\",\"authors\":\"Khushboo Kalia, N. Gupta\",\"doi\":\"10.1109/ICNGCIS.2017.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.\",\"PeriodicalId\":314733,\"journal\":{\"name\":\"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNGCIS.2017.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNGCIS.2017.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

Hadoop是一种基于java的分布式计算环境，它不仅存储而且处理海量数据。它的HDFS (Hadoop分布式文件系统)用于存储数据，分析由MapReduce完成。MapReduce是一种新兴的范例，用于使用无共享集群处理庞大的数据集。许多组织已经将MapReduce用于他们的分析工作。为了提高共享集群的性能和利用率，不同的作者提出了多种调度机制。在MapReduce作业调度过程中会遇到许多问题，如局部性、同步开销和公平性。本文通过对局部性、同步性和公平性等调度问题的介绍，综述了处理这些问题的各种方法。此外，本文还讨论了各种调度算法的评估和解决同步方法(如异步处理和推测执行)期间的开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Review on Job Scheduling for Hadoop Mapreduce

Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)

自引率

0.00%

发文量