{"title":"共生调度的并发GPU内核的性能和能源优化","authors":"Teng Li, Vikram K. Narayana, T. El-Ghazawi","doi":"10.1145/2597917.2597925","DOIUrl":null,"url":null,"abstract":"The incorporation of GPUs as co-processors has brought forth significant performance improvements for High-Performance Computing (HPC). Efficient utilization of the GPU resources is thus an important consideration for computer scientists. In order to obtain the required performance while limiting the energy consumption, researchers and vendors alike are seeking to apply traditional CPU approaches into the GPU computing domain. For instance, newer NVIDIA GPUs now support concurrent execution of independent kernels as well as Dynamic Voltage and Frequency Scaling (DVFS). Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and energy constraints. In this paper, we carry out performance and energy optimizations geared towards the execution phases of concurrent kernels in GPU-based computing. When multiple GPU kernels are enqueued for concurrent execution, the sequence in which they are initiated can significantly affect the total execution time and the energy consumption. We attribute this behavior to the relative synergy among kernels that are launched within close proximity of each other. Accordingly, we define metrics for computing the extent to which kernels are symbiotic, by modeling their complementary resource requirements and execution characteristics. We then propose a symbiotic scheduling algorithm to obtain the best possible kernel launch sequence for concurrent execution. Experimental results on the latest NVIDIA K20 GPU demonstrate the efficacy of our proposed algorithm-based approach, by showing near-optimal results within the solution space of both performance and energy consumption. As our further experimental study on DVFS finds that increasing the GPU frequency in general leads to improved performance and energy saving, the proposed approach reduces the necessity for over-clocking and can be readily adopted by programmers with minimal programming effort and risk.","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Symbiotic scheduling of concurrent GPU kernels for performance and energy optimizations\",\"authors\":\"Teng Li, Vikram K. Narayana, T. El-Ghazawi\",\"doi\":\"10.1145/2597917.2597925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The incorporation of GPUs as co-processors has brought forth significant performance improvements for High-Performance Computing (HPC). Efficient utilization of the GPU resources is thus an important consideration for computer scientists. In order to obtain the required performance while limiting the energy consumption, researchers and vendors alike are seeking to apply traditional CPU approaches into the GPU computing domain. For instance, newer NVIDIA GPUs now support concurrent execution of independent kernels as well as Dynamic Voltage and Frequency Scaling (DVFS). Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and energy constraints. In this paper, we carry out performance and energy optimizations geared towards the execution phases of concurrent kernels in GPU-based computing. When multiple GPU kernels are enqueued for concurrent execution, the sequence in which they are initiated can significantly affect the total execution time and the energy consumption. We attribute this behavior to the relative synergy among kernels that are launched within close proximity of each other. Accordingly, we define metrics for computing the extent to which kernels are symbiotic, by modeling their complementary resource requirements and execution characteristics. We then propose a symbiotic scheduling algorithm to obtain the best possible kernel launch sequence for concurrent execution. Experimental results on the latest NVIDIA K20 GPU demonstrate the efficacy of our proposed algorithm-based approach, by showing near-optimal results within the solution space of both performance and energy consumption. As our further experimental study on DVFS finds that increasing the GPU frequency in general leads to improved performance and energy saving, the proposed approach reduces the necessity for over-clocking and can be readily adopted by programmers with minimal programming effort and risk.\",\"PeriodicalId\":194910,\"journal\":{\"name\":\"Proceedings of the 11th ACM Conference on Computing Frontiers\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2597917.2597925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2597917.2597925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Symbiotic scheduling of concurrent GPU kernels for performance and energy optimizations
The incorporation of GPUs as co-processors has brought forth significant performance improvements for High-Performance Computing (HPC). Efficient utilization of the GPU resources is thus an important consideration for computer scientists. In order to obtain the required performance while limiting the energy consumption, researchers and vendors alike are seeking to apply traditional CPU approaches into the GPU computing domain. For instance, newer NVIDIA GPUs now support concurrent execution of independent kernels as well as Dynamic Voltage and Frequency Scaling (DVFS). Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and energy constraints. In this paper, we carry out performance and energy optimizations geared towards the execution phases of concurrent kernels in GPU-based computing. When multiple GPU kernels are enqueued for concurrent execution, the sequence in which they are initiated can significantly affect the total execution time and the energy consumption. We attribute this behavior to the relative synergy among kernels that are launched within close proximity of each other. Accordingly, we define metrics for computing the extent to which kernels are symbiotic, by modeling their complementary resource requirements and execution characteristics. We then propose a symbiotic scheduling algorithm to obtain the best possible kernel launch sequence for concurrent execution. Experimental results on the latest NVIDIA K20 GPU demonstrate the efficacy of our proposed algorithm-based approach, by showing near-optimal results within the solution space of both performance and energy consumption. As our further experimental study on DVFS finds that increasing the GPU frequency in general leads to improved performance and energy saving, the proposed approach reduces the necessity for over-clocking and can be readily adopted by programmers with minimal programming effort and risk.