Decoupling the programming model from resource management in throughput processors

Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, S. Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, O. Mutlu
{"title":"Decoupling the programming model from resource management in throughput processors","authors":"Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, S. Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, O. Mutlu","doi":"10.1049/pbpc022e_ch4","DOIUrl":null,"url":null,"abstract":"This chapter introduces a new resource virtualization framework, Zorua, that decouples the graphics processing unit (GPU) programming model from the management of key on-chip resources in hardware to enhance programming ease, portability, and performance. The application resource specification-a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block-forms a critical component of the existing GPU programming models. This specification determines the parallelism, and, hence, performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely based on this specification. This tight coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance, as we demonstrate in this chapter using real data obtained on state-of-the-art GPU systems. Our goal in this work is to reduce the dependence of performance on the software-provided static resource specification to simultaneously alleviate the above challenges. To this end, we introduce Zorua, a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer. The virtualization provided by Zorua builds on two key concepts-dynamic allocation of the on-chip resources and their oversubscription using a swap space in memory. Zorua provides a holistic GPU resource virtualization strategy designed to (i) adaptively control the extent of oversubscription and (ii) coordinate the dynamic management of multiple on-chip resources to maximize the effectiveness of virtualization.We demonstrate that by providing the illusion of more resources than physically available via controlled and coordinated virtualization, Zorua offers several important benefits: (i) Programming ease. It eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability. It alleviates the necessity of retuning an application's resource usage when porting the application across GPU generations. (iii) Performance. By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources. The holistic virtualization provided by Zorua has many other potential uses, e.g., fine-grained resource sharing among multiple kernels, low latency preemption of GPU programs, and support for dynamic parallelism, which we describe in this chapter.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Many-Core Computing: Hardware and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/pbpc022e_ch4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This chapter introduces a new resource virtualization framework, Zorua, that decouples the graphics processing unit (GPU) programming model from the management of key on-chip resources in hardware to enhance programming ease, portability, and performance. The application resource specification-a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block-forms a critical component of the existing GPU programming models. This specification determines the parallelism, and, hence, performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely based on this specification. This tight coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance, as we demonstrate in this chapter using real data obtained on state-of-the-art GPU systems. Our goal in this work is to reduce the dependence of performance on the software-provided static resource specification to simultaneously alleviate the above challenges. To this end, we introduce Zorua, a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer. The virtualization provided by Zorua builds on two key concepts-dynamic allocation of the on-chip resources and their oversubscription using a swap space in memory. Zorua provides a holistic GPU resource virtualization strategy designed to (i) adaptively control the extent of oversubscription and (ii) coordinate the dynamic management of multiple on-chip resources to maximize the effectiveness of virtualization.We demonstrate that by providing the illusion of more resources than physically available via controlled and coordinated virtualization, Zorua offers several important benefits: (i) Programming ease. It eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability. It alleviates the necessity of retuning an application's resource usage when porting the application across GPU generations. (iii) Performance. By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources. The holistic virtualization provided by Zorua has many other potential uses, e.g., fine-grained resource sharing among multiple kernels, low latency preemption of GPU programs, and support for dynamic parallelism, which we describe in this chapter.
将编程模型与吞吐量处理器中的资源管理分离
本章介绍了一个新的资源虚拟化框架,Zorua,它将图形处理单元(GPU)编程模型与硬件中关键片上资源的管理解耦,以增强编程的便利性、可移植性和性能。应用程序资源规范是几个参数的静态规范,例如线程数量和每个线程块的临时内存使用情况,它构成了现有GPU编程模型的关键组件。这个规范决定了并行性,从而决定了应用程序在执行期间的性能,因为相应的片上硬件资源是完全基于这个规范分配和管理的。软件提供的资源规范和硬件中的资源管理之间的紧密耦合导致了编程便捷性,可移植性和性能方面的重大挑战,正如我们在本章中使用最先进的GPU系统上获得的真实数据所演示的那样。我们在这项工作中的目标是减少性能对软件提供的静态资源规范的依赖,同时减轻上述挑战。为此,我们引入了新的资源虚拟化框架Zorua,它将程序员指定的GPU应用程序的资源使用与片上硬件资源的实际分配解耦。Zorua通过对程序员透明地虚拟化每个资源来实现这种解耦。Zorua提供的虚拟化建立在两个关键概念之上:片上资源的动态分配和使用内存中的交换空间对其进行过度订阅。Zorua提供了一个全面的GPU资源虚拟化策略,旨在(i)自适应地控制超额订阅的程度,(ii)协调多个片上资源的动态管理,以最大限度地提高虚拟化的有效性。我们证明,通过控制和协调虚拟化提供比物理可用资源更多的假象,Zorua提供了几个重要的好处:(i)编程容易。它减轻了程序员提供代码的负担,这些代码经过调优,可以有效地利用芯片上可用的物理资源。(2)可移植性。它减轻了在跨GPU代移植应用程序时返回应用程序资源使用的必要性。(3)性能。通过动态分配资源并在必要时小心地超额订阅资源,Zorua提高或保留了已经经过高度调优以最佳利用资源的应用程序的性能。Zorua提供的整体虚拟化还有许多其他潜在的用途,例如,多个内核之间的细粒度资源共享,GPU程序的低延迟抢占,以及对动态并行的支持,我们将在本章中描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信