{"title":"Hadoop Job Scheduling with Dynamic Task Splitting","authors":"YongLiang Xu, Wentong Cai","doi":"10.1109/ICCCRI.2015.31","DOIUrl":null,"url":null,"abstract":"Fairness and data locality are often in conflict in Hadoop job scheduling. During scheduling, it is not always possible for data locality to be achieved for all jobs or for fairness to be attained for all users. Achieving pure fairness may compromise the data locality of the jobs which will negatively affect performances, and vice-versa. For example, a scheduler may opt to sacrifice performance by scheduling tasks to non-data local nodes. Alternatively, a scheduler may choose to sacrifice fairness by giving up an available slot and wait for a data-local node. The Dynamic Task Splitting Scheduler (DTSS) is proposed to mitigate the tradeoffs between fairness and data locality during job scheduling. DTSS does so by dynamically splitting a task and executing the split task immediately, on a non-data-local node, to improve the fairness. Analysis and experiments results show that it is possible to improve both fairness and the performance by adjusting the proportion of the task split. DTSS is shown to improve the make span of different users in a cluster by 2% to 11% as compared to delay scheduling under the situation where it is difficult to obtain data-local nodes on a cluster. Lastly, experiments show that DTSS is not a suitable scheduler under conditions where jobs are able to obtain data-local nodes easily.","PeriodicalId":183970,"journal":{"name":"2015 International Conference on Cloud Computing Research and Innovation (ICCCRI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Cloud Computing Research and Innovation (ICCCRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCRI.2015.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Fairness and data locality are often in conflict in Hadoop job scheduling. During scheduling, it is not always possible for data locality to be achieved for all jobs or for fairness to be attained for all users. Achieving pure fairness may compromise the data locality of the jobs which will negatively affect performances, and vice-versa. For example, a scheduler may opt to sacrifice performance by scheduling tasks to non-data local nodes. Alternatively, a scheduler may choose to sacrifice fairness by giving up an available slot and wait for a data-local node. The Dynamic Task Splitting Scheduler (DTSS) is proposed to mitigate the tradeoffs between fairness and data locality during job scheduling. DTSS does so by dynamically splitting a task and executing the split task immediately, on a non-data-local node, to improve the fairness. Analysis and experiments results show that it is possible to improve both fairness and the performance by adjusting the proportion of the task split. DTSS is shown to improve the make span of different users in a cluster by 2% to 11% as compared to delay scheduling under the situation where it is difficult to obtain data-local nodes on a cluster. Lastly, experiments show that DTSS is not a suitable scheduler under conditions where jobs are able to obtain data-local nodes easily.