{"title":"Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs","authors":"Tyler N. Allen, Xizhou Feng, Rong Ge","doi":"10.1109/IPDPS.2019.00035","DOIUrl":null,"url":null,"abstract":"As GPUs now contribute the majority of computing power for HPC and data centers, improving GPU utilization becomes an important research problem. Sharing GPU among multiple kernels is an effective approach but requires judicious kernel selection and scheduling for optimal gains. In this paper, we present Slate, a software-based workload-aware GPU multiprocessing framework that enables concurrent kernels from different processes to share GPU devices. Slate selects concurrent kernels that have complementary resource demands at run time to minimize interference for individual kernels and improve GPU resource utilization. Slate adjusts the size of application kernels on-the-fly so that kernels readily share, release, and claim resources based on GPU status. It further controls overhead including data transfers and synchronization. We have built a prototype of Slate and evaluated it on a system with a NVIDIA Titan Xp card. Our experiments show that Slate improves system throughput by 11% on average and up to 35% at the best scenario for the tested applications, in comparison to NVIDIA MultiProcess Service (MPS) that uses hardware scheduling and the leftover policy for resource sharing.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
As GPUs now contribute the majority of computing power for HPC and data centers, improving GPU utilization becomes an important research problem. Sharing GPU among multiple kernels is an effective approach but requires judicious kernel selection and scheduling for optimal gains. In this paper, we present Slate, a software-based workload-aware GPU multiprocessing framework that enables concurrent kernels from different processes to share GPU devices. Slate selects concurrent kernels that have complementary resource demands at run time to minimize interference for individual kernels and improve GPU resource utilization. Slate adjusts the size of application kernels on-the-fly so that kernels readily share, release, and claim resources based on GPU status. It further controls overhead including data transfers and synchronization. We have built a prototype of Slate and evaluated it on a system with a NVIDIA Titan Xp card. Our experiments show that Slate improves system throughput by 11% on average and up to 35% at the best scenario for the tested applications, in comparison to NVIDIA MultiProcess Service (MPS) that uses hardware scheduling and the leftover policy for resource sharing.