Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI:10.1109/ISPASS.2015.7095803

Jeffrey T. Bush, Philip Dexter, Timothy N. Miller, A. Carpenter

{"title":"Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads","authors":"Jeffrey T. Bush, Philip Dexter, Timothy N. Miller, A. Carpenter","doi":"10.1109/ISPASS.2015.7095803","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) continue to grow in popularity for general-purpose, highly parallel, high-throughput systems. This has forced GPU vendors to increase their focus on general purpose workloads, sometimes at the expense of the graphics-specific workloads. Using GPUs for general-purpose computation is a departure from the driving forces behind programmable GPUs that were focused on a narrow subset of graphics rendering operations. Rather than focus on purely graphics-related or general-purpose use, we have designed and modeled an architecture that optimizes for both simultaneously to efficiently handle all GPU workloads. In this paper, we present Nyami, a co-optimized GPU architecture and simulation model with an open-source implementation written in Verilog. This approach allows us to more easily explore the GPU design space in a synthesizable, cycle-precise, modular environment. An instruction-precise functional simulator is provided for co-simulation and verification. Overall, we assume a GPU may be used as a general-purpose GPU (GPGPU) or a graphics engine and account for this in the architecture's construction and in the options and modules selectable for synthesis and simulation. To demonstrate Nyami's viability as a GPU research platform, we exploit its flexibility and modularity to explore the impact of a set of architectural decisions. These include sensitivity to cache size and associativity, barrel and switch-on-stall multithreaded instruction scheduling, and software vs. hardware implementations of rasterization. Through these experiments, we gain insight into commonly accepted GPU architecture decisions, adapt the architecture accordingly, and give examples of the intended use as a GPU research tool.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Graphics processing units (GPUs) continue to grow in popularity for general-purpose, highly parallel, high-throughput systems. This has forced GPU vendors to increase their focus on general purpose workloads, sometimes at the expense of the graphics-specific workloads. Using GPUs for general-purpose computation is a departure from the driving forces behind programmable GPUs that were focused on a narrow subset of graphics rendering operations. Rather than focus on purely graphics-related or general-purpose use, we have designed and modeled an architecture that optimizes for both simultaneously to efficiently handle all GPU workloads. In this paper, we present Nyami, a co-optimized GPU architecture and simulation model with an open-source implementation written in Verilog. This approach allows us to more easily explore the GPU design space in a synthesizable, cycle-precise, modular environment. An instruction-precise functional simulator is provided for co-simulation and verification. Overall, we assume a GPU may be used as a general-purpose GPU (GPGPU) or a graphics engine and account for this in the architecture's construction and in the options and modules selectable for synthesis and simulation. To demonstrate Nyami's viability as a GPU research platform, we exploit its flexibility and modularity to explore the impact of a set of architectural decisions. These include sensitivity to cache size and associativity, barrel and switch-on-stall multithreaded instruction scheduling, and software vs. hardware implementations of rasterization. Through these experiments, we gain insight into commonly accepted GPU architecture decisions, adapt the architecture accordingly, and give examples of the intended use as a GPU research tool.

查看原文本刊更多论文

图形处理单元(gpu)在通用、高度并行、高吞吐量的系统中越来越受欢迎。这迫使GPU供应商增加对通用工作负载的关注，有时以牺牲特定于图形的工作负载为代价。使用gpu进行通用计算是对可编程gpu背后驱动力的背离，可编程gpu专注于图形渲染操作的狭窄子集。而不是专注于纯粹的图形相关或通用用途，我们已经设计和建模了一个架构，同时优化两者，以有效地处理所有GPU工作负载。在本文中，我们提出了Nyami，一个协同优化的GPU架构和仿真模型，并使用Verilog编写了一个开源实现。这种方法使我们能够在可合成的、周期精确的、模块化的环境中更容易地探索GPU设计空间。提供了一个指令精确的功能模拟器，用于联合仿真和验证。总体而言，我们假设GPU可以用作通用GPU (GPGPU)或图形引擎，并在架构的构建以及可用于合成和模拟的选项和模块中考虑到这一点。为了证明Nyami作为GPU研究平台的可行性，我们利用其灵活性和模块化来探索一组架构决策的影响。其中包括对缓存大小和关联性的敏感性，桶式和开关式暂停多线程指令调度，以及栅格化的软件与硬件实现。通过这些实验，我们深入了解了普遍接受的GPU架构决策，相应地调整了架构，并给出了作为GPU研究工具的预期用途的示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量