应用感知内存系统的公平和有效的执行并发GPGPU应用

Proceedings of Workshop on General Purpose Processing Using GPUs Pub Date : 2014-03-01 DOI:10.1145/2588768.2576780

Adwait Jog, Evgeny Bolotin, Zvika Guz, Mike Parker, S. Keckler, M. Kandemir, C. Das

{"title":"应用感知内存系统的公平和有效的执行并发GPGPU应用","authors":"Adwait Jog, Evgeny Bolotin, Zvika Guz, Mike Parker, S. Keckler, M. Kandemir, C. Das","doi":"10.1145/2588768.2576780","DOIUrl":null,"url":null,"abstract":"The available computing resources in modern GPUs are growing with each new generation. However, as many general purpose applications with limited thread-scalability are tuned to take advantage of GPUs, available compute resources might not be optimally utilized. To address this, modern GPUs will need to execute multiple kernels simultaneously. As current generations of GPUs (e.g., NVIDIA Kepler, AMD Radeon) already enable concurrent execution of kernels from the same application, in this paper we address the next logical step: executing multiple concurrent applications in GPUs. We show that while this paradigm has a potential to improve the overall system performance, negative interactions among concurrently executing applications in the memory system can severely hamper the performance and fairness among applications. We show that the current application agnostic GPU memory system design can (1) lead to sub-optimal GPU performance; and (2) create significant imbalance in performance slowdowns across kernels. Thus, we argue that GPU memory system should be augmented with application awareness. As one example to the applicability of this concept, we augment the memory system hardware with application awareness such that requests from different applications can be scheduled in a round robin (RR) fashion while still preserving the benefits of the current first-ready FCFS (FR-FCFS) memory scheduling policy. Evaluations with different multi-application workloads demonstrate that the proposed memory scheduling policy, first-ready round-robin FCFS (FR-RR-FCFS), improves fairness and delivers better system performance compared to the existing FR-FCFS memory scheduling scheme.","PeriodicalId":394600,"journal":{"name":"Proceedings of Workshop on General Purpose Processing Using GPUs","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":"{\"title\":\"Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications\",\"authors\":\"Adwait Jog, Evgeny Bolotin, Zvika Guz, Mike Parker, S. Keckler, M. Kandemir, C. Das\",\"doi\":\"10.1145/2588768.2576780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The available computing resources in modern GPUs are growing with each new generation. However, as many general purpose applications with limited thread-scalability are tuned to take advantage of GPUs, available compute resources might not be optimally utilized. To address this, modern GPUs will need to execute multiple kernels simultaneously. As current generations of GPUs (e.g., NVIDIA Kepler, AMD Radeon) already enable concurrent execution of kernels from the same application, in this paper we address the next logical step: executing multiple concurrent applications in GPUs. We show that while this paradigm has a potential to improve the overall system performance, negative interactions among concurrently executing applications in the memory system can severely hamper the performance and fairness among applications. We show that the current application agnostic GPU memory system design can (1) lead to sub-optimal GPU performance; and (2) create significant imbalance in performance slowdowns across kernels. Thus, we argue that GPU memory system should be augmented with application awareness. As one example to the applicability of this concept, we augment the memory system hardware with application awareness such that requests from different applications can be scheduled in a round robin (RR) fashion while still preserving the benefits of the current first-ready FCFS (FR-FCFS) memory scheduling policy. Evaluations with different multi-application workloads demonstrate that the proposed memory scheduling policy, first-ready round-robin FCFS (FR-RR-FCFS), improves fairness and delivers better system performance compared to the existing FR-FCFS memory scheduling scheme.\",\"PeriodicalId\":394600,\"journal\":{\"name\":\"Proceedings of Workshop on General Purpose Processing Using GPUs\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"59\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of Workshop on General Purpose Processing Using GPUs\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2588768.2576780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Workshop on General Purpose Processing Using GPUs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2588768.2576780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 59

摘要

现代gpu的可用计算资源每一代都在增长。然而，由于许多线程可伸缩性有限的通用应用程序被调优为利用gpu，可用的计算资源可能无法得到最佳利用。为了解决这个问题，现代gpu需要同时执行多个内核。由于当前几代gpu(例如，NVIDIA Kepler, AMD Radeon)已经能够从同一个应用程序并发执行内核，在本文中，我们解决了下一个逻辑步骤:在gpu中执行多个并发应用程序。我们表明，虽然这种范式有可能提高整体系统性能，但内存系统中并发执行的应用程序之间的负交互可能严重妨碍应用程序之间的性能和公平性。我们表明，当前与应用无关的GPU内存系统设计可能(1)导致GPU性能次优;(2)在不同内核之间造成显著的性能下降不平衡。因此，我们认为GPU存储系统应该增强应用意识。作为这个概念适用性的一个例子，我们用应用程序感知增强内存系统硬件，这样来自不同应用程序的请求可以以轮询(RR)方式调度，同时仍然保留当前的先准备FCFS (FR-FCFS)内存调度策略的优点。在不同的多应用工作负载下进行的评估表明，与现有的FR-FCFS内存调度方案相比，提出的先准备轮询FCFS (first-ready round-robin FCFS)内存调度策略提高了公平性，并提供了更好的系统性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

The available computing resources in modern GPUs are growing with each new generation. However, as many general purpose applications with limited thread-scalability are tuned to take advantage of GPUs, available compute resources might not be optimally utilized. To address this, modern GPUs will need to execute multiple kernels simultaneously. As current generations of GPUs (e.g., NVIDIA Kepler, AMD Radeon) already enable concurrent execution of kernels from the same application, in this paper we address the next logical step: executing multiple concurrent applications in GPUs. We show that while this paradigm has a potential to improve the overall system performance, negative interactions among concurrently executing applications in the memory system can severely hamper the performance and fairness among applications. We show that the current application agnostic GPU memory system design can (1) lead to sub-optimal GPU performance; and (2) create significant imbalance in performance slowdowns across kernels. Thus, we argue that GPU memory system should be augmented with application awareness. As one example to the applicability of this concept, we augment the memory system hardware with application awareness such that requests from different applications can be scheduled in a round robin (RR) fashion while still preserving the benefits of the current first-ready FCFS (FR-FCFS) memory scheduling policy. Evaluations with different multi-application workloads demonstrate that the proposed memory scheduling policy, first-ready round-robin FCFS (FR-RR-FCFS), improves fairness and delivers better system performance compared to the existing FR-FCFS memory scheduling scheme.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of Workshop on General Purpose Processing Using GPUs

自引率

0.00%

发文量