{"title":"A Review on Job Scheduling for Hadoop Mapreduce","authors":"Khushboo Kalia, N. Gupta","doi":"10.1109/ICNGCIS.2017.40","DOIUrl":null,"url":null,"abstract":"Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.","PeriodicalId":314733,"journal":{"name":"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNGCIS.2017.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Hadoop is a distributed computing environment based on java which not only stores but also process the vast volume of data. It's HDFS (Hadoop Distributed File System) is for storing the data and analytics is done by MapReduce. MapReduce is an emerging paradigm for handling huge data sets using shared-nothing clusters. Lot of organizations have already adopted MapReduce for their analytics work. To boost the performance and utilization of the shared cluster, many scheduling mechanism are proposed by different authors. Many problems are faced during MapReduce jobs scheduling such as-locality, synchronization overhead, and fairness. Now, by introducing various scheduling issues concerned with locality, synchronization and fairness this paper surveys the various approaches to handle these problems. In addition, here evaluation of the various scheduling algorithms and for solving overhead during synchronization methods like asynchronous processing and speculative execution are also discussed.