{"title":"Argus: Efficient Job Scheduling in RDMA-assisted Big Data Processing","authors":"Sijie Wu, Hanhua Chen, Yonghui Wang, Hai Jin","doi":"10.1109/IPDPS49936.2021.00092","DOIUrl":null,"url":null,"abstract":"Efficient job scheduling is an important and challenging issue in big data processing systems. Traditional designs commonly give priority to data locality during scheduling and follow a network-optimized principle to avoid costly data moving across the network. The emergence of the high-performance Remote Direct Memory Access (RDMA) network brings new opportunities for big data processing systems. However, the existing RDMA-assisted designs ignore the dependency among stages during scheduling and this can result in unsatisfied system efficiency. In this work, we propose Argus, a novel RDMA-assisted job scheduler which achieves high resource utilization by fully exploiting the structure feature of stage dependency. Argus prioritizes the stages whose completion can enable more schedulable stages. We implement Argus on top of RDMA-Spark, and conduct comprehensive experiments to evaluate the performance using large-scale traces collected from real-world systems. Results show that compared to state-of-the-art designs, Argus reduces the job completion time and makespan by 38% and 31%, respectively.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Efficient job scheduling is an important and challenging issue in big data processing systems. Traditional designs commonly give priority to data locality during scheduling and follow a network-optimized principle to avoid costly data moving across the network. The emergence of the high-performance Remote Direct Memory Access (RDMA) network brings new opportunities for big data processing systems. However, the existing RDMA-assisted designs ignore the dependency among stages during scheduling and this can result in unsatisfied system efficiency. In this work, we propose Argus, a novel RDMA-assisted job scheduler which achieves high resource utilization by fully exploiting the structure feature of stage dependency. Argus prioritizes the stages whose completion can enable more schedulable stages. We implement Argus on top of RDMA-Spark, and conduct comprehensive experiments to evaluate the performance using large-scale traces collected from real-world systems. Results show that compared to state-of-the-art designs, Argus reduces the job completion time and makespan by 38% and 31%, respectively.