{"title":"Optimal Task Scheduling in MapReduce","authors":"Changjian Wang, Yuxing Peng, Junyi Liu, Mingxing Tang, Guangming Liu, Jinghua Feng, Pengfei You","doi":"10.1109/NAS.2014.26","DOIUrl":null,"url":null,"abstract":"The scheduling approach in MapReduce may result in the \"long tail\" problem because of the unreasonable task assignment and high scheduling overhead because of an amount of task scheduling operations. To address these problems, a new task scheduling approach for MapReduce, named \"Iterative Task Scheduling Algorithm\", is proposed. The new approach tries to schedule the map tasks according to the solution of the equation for the optimal task assignment. Thus the \"long tail\" problem can be mitigated effectively and the task scheduling operations can be significantly reduced. To support our new scheduling approach, two approaches are proposed: The first one is adopted to estimate task execution times of nodes and the second one is adopted to produce the optimal task assignment based on the known task execution times of nodes. Comprehensive experiments have been performed with the real log data from the Ali Cloud and the results verify the effectiveness of the new task scheduling approach. The map runtime of the job is reduced 23% in our experiments.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2014.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The scheduling approach in MapReduce may result in the "long tail" problem because of the unreasonable task assignment and high scheduling overhead because of an amount of task scheduling operations. To address these problems, a new task scheduling approach for MapReduce, named "Iterative Task Scheduling Algorithm", is proposed. The new approach tries to schedule the map tasks according to the solution of the equation for the optimal task assignment. Thus the "long tail" problem can be mitigated effectively and the task scheduling operations can be significantly reduced. To support our new scheduling approach, two approaches are proposed: The first one is adopted to estimate task execution times of nodes and the second one is adopted to produce the optimal task assignment based on the known task execution times of nodes. Comprehensive experiments have been performed with the real log data from the Ali Cloud and the results verify the effectiveness of the new task scheduling approach. The map runtime of the job is reduced 23% in our experiments.