Zhenning Wang, Jun Yang, R. Melhem, B. Childers, Youtao Zhang, M. Guo
{"title":"同步多内核GPU:通过细粒度共享的多任务吞吐量处理器","authors":"Zhenning Wang, Jun Yang, R. Melhem, B. Childers, Youtao Zhang, M. Guo","doi":"10.1109/HPCA.2016.7446078","DOIUrl":null,"url":null,"abstract":"Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs can effectively improve utilization, which is particularly attractive to systems where many applications require access to the GPU (e.g., cloud computing). However, current GPUs lack proper architecture features to support sharing. Initial attempts are preliminary: They either provide only static sharing, which requires recompilation or code transformation, or they do not effectively improve GPU resource utilization. We propose Simultaneous Multikernel (SMK), a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels. We propose several resource allocation strategies to improve system throughput while maintaining fairness. Our evaluation shows that for shared workloads with complementary resource occupancy, SMK improves GPU throughput by 52% over non-shared execution and 17% over a state-of-the-art design.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"129","resultStr":"{\"title\":\"Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing\",\"authors\":\"Zhenning Wang, Jun Yang, R. Melhem, B. Childers, Youtao Zhang, M. Guo\",\"doi\":\"10.1109/HPCA.2016.7446078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs can effectively improve utilization, which is particularly attractive to systems where many applications require access to the GPU (e.g., cloud computing). However, current GPUs lack proper architecture features to support sharing. Initial attempts are preliminary: They either provide only static sharing, which requires recompilation or code transformation, or they do not effectively improve GPU resource utilization. We propose Simultaneous Multikernel (SMK), a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels. We propose several resource allocation strategies to improve system throughput while maintaining fairness. Our evaluation shows that for shared workloads with complementary resource occupancy, SMK improves GPU throughput by 52% over non-shared execution and 17% over a state-of-the-art design.\",\"PeriodicalId\":417994,\"journal\":{\"name\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"129\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2016.7446078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing
Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs can effectively improve utilization, which is particularly attractive to systems where many applications require access to the GPU (e.g., cloud computing). However, current GPUs lack proper architecture features to support sharing. Initial attempts are preliminary: They either provide only static sharing, which requires recompilation or code transformation, or they do not effectively improve GPU resource utilization. We propose Simultaneous Multikernel (SMK), a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels. We propose several resource allocation strategies to improve system throughput while maintaining fairness. Our evaluation shows that for shared workloads with complementary resource occupancy, SMK improves GPU throughput by 52% over non-shared execution and 17% over a state-of-the-art design.