同步多内核GPU:通过细粒度共享的多任务吞吐量处理器

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2016-03-12 DOI:10.1109/HPCA.2016.7446078

Zhenning Wang, Jun Yang, R. Melhem, B. Childers, Youtao Zhang, M. Guo

{"title":"同步多内核GPU:通过细粒度共享的多任务吞吐量处理器","authors":"Zhenning Wang, Jun Yang, R. Melhem, B. Childers, Youtao Zhang, M. Guo","doi":"10.1109/HPCA.2016.7446078","DOIUrl":null,"url":null,"abstract":"Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs can effectively improve utilization, which is particularly attractive to systems where many applications require access to the GPU (e.g., cloud computing). However, current GPUs lack proper architecture features to support sharing. Initial attempts are preliminary: They either provide only static sharing, which requires recompilation or code transformation, or they do not effectively improve GPU resource utilization. We propose Simultaneous Multikernel (SMK), a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels. We propose several resource allocation strategies to improve system throughput while maintaining fairness. Our evaluation shows that for shared workloads with complementary resource occupancy, SMK improves GPU throughput by 52% over non-shared execution and 17% over a state-of-the-art design.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"129","resultStr":"{\"title\":\"Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing\",\"authors\":\"Zhenning Wang, Jun Yang, R. Melhem, B. Childers, Youtao Zhang, M. Guo\",\"doi\":\"10.1109/HPCA.2016.7446078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs can effectively improve utilization, which is particularly attractive to systems where many applications require access to the GPU (e.g., cloud computing). However, current GPUs lack proper architecture features to support sharing. Initial attempts are preliminary: They either provide only static sharing, which requires recompilation or code transformation, or they do not effectively improve GPU resource utilization. We propose Simultaneous Multikernel (SMK), a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels. We propose several resource allocation strategies to improve system throughput while maintaining fairness. Our evaluation shows that for shared workloads with complementary resource occupancy, SMK improves GPU throughput by 52% over non-shared execution and 17% over a state-of-the-art design.\",\"PeriodicalId\":417994,\"journal\":{\"name\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"129\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2016.7446078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 129

摘要

研究表明，非图形程序对GPU硬件的优化程度较低，导致显著的资源利用率不足。在多个程序之间共享GPU可以有效地提高利用率，这对于许多应用程序需要访问GPU的系统(例如云计算)特别有吸引力。然而，目前的gpu缺乏适当的架构特性来支持共享。最初的尝试是初步的:它们要么只提供静态共享，这需要重新编译或代码转换，要么它们不能有效地提高GPU资源利用率。同时多内核(Simultaneous Multikernel, SMK)是一种细粒度动态共享机制，通过利用不同内核的异构性，充分利用流多处理器内的资源。我们提出了几种资源分配策略，以提高系统吞吐量，同时保持公平性。我们的评估表明，对于具有互补资源占用的共享工作负载，SMK比非共享执行提高了52%的GPU吞吐量，比最先进的设计提高了17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs can effectively improve utilization, which is particularly attractive to systems where many applications require access to the GPU (e.g., cloud computing). However, current GPUs lack proper architecture features to support sharing. Initial attempts are preliminary: They either provide only static sharing, which requires recompilation or code transformation, or they do not effectively improve GPU resource utilization. We propose Simultaneous Multikernel (SMK), a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels. We propose several resource allocation strategies to improve system throughput while maintaining fairness. Our evaluation shows that for shared workloads with complementary resource occupancy, SMK improves GPU throughput by 52% over non-shared execution and 17% over a state-of-the-art design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量