gpu中高效控制流处理的动态逐曲再收敛堆栈

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI:10.1109/ISVLSI.2016.35

Yaohua Wang, Xiaowen Chen, Dong Wang, Sheng Liu

{"title":"gpu中高效控制流处理的动态逐曲再收敛堆栈","authors":"Yaohua Wang, Xiaowen Chen, Dong Wang, Sheng Liu","doi":"10.1109/ISVLSI.2016.35","DOIUrl":null,"url":null,"abstract":"GPGPUs usually experience performance degradation when the control flow of threads diverges in a warp. Reconvergence stack based control flow handling scheme is widely adopted in GPU architectures. The depth of such stack is always set to a large number, so that there can be enough entries for warps experiencing nested branches. However, for warps experiencing simple branches or even no branches, those deep reconvergence stacks would stay idle, causing a serious waste of hardware resource. Moreover, with the development of GPU architectures, more and more warps will be deployed on a GPU stream processor core, such problem could be even more serious. To solve this problem, this paper propose a dynamic reconvergence stack structure, in which a stack pool is shared by all the warps, and dynamic stacks of different warps can be constructed according to the run-time requirement. This can satisfy the stack requirement while eliminating unnecessary waste of hardware resource. Our experiments show that the dynamic reconvergence stack can reduce the cost of stack by 50% with the conventional performance well maintained.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs\",\"authors\":\"Yaohua Wang, Xiaowen Chen, Dong Wang, Sheng Liu\",\"doi\":\"10.1109/ISVLSI.2016.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPGPUs usually experience performance degradation when the control flow of threads diverges in a warp. Reconvergence stack based control flow handling scheme is widely adopted in GPU architectures. The depth of such stack is always set to a large number, so that there can be enough entries for warps experiencing nested branches. However, for warps experiencing simple branches or even no branches, those deep reconvergence stacks would stay idle, causing a serious waste of hardware resource. Moreover, with the development of GPU architectures, more and more warps will be deployed on a GPU stream processor core, such problem could be even more serious. To solve this problem, this paper propose a dynamic reconvergence stack structure, in which a stack pool is shared by all the warps, and dynamic stacks of different warps can be constructed according to the run-time requirement. This can satisfy the stack requirement while eliminating unnecessary waste of hardware resource. Our experiments show that the dynamic reconvergence stack can reduce the cost of stack by 50% with the conventional performance well maintained.\",\"PeriodicalId\":140647,\"journal\":{\"name\":\"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVLSI.2016.35\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2016.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

当线程的控制流在扭曲中发散时，gpgpu通常会经历性能下降。基于再收敛堆栈的控制流处理方案在GPU体系结构中被广泛采用。这种堆栈的深度总是设置为较大的数字，以便有足够的条目用于经历嵌套分支的warp。然而，对于经历简单分支甚至没有分支的warp，这些深度再收敛堆栈将处于闲置状态，造成严重的硬件资源浪费。此外，随着GPU架构的发展，越来越多的warp将部署在GPU流处理器核心上，这一问题可能会更加严重。为了解决这一问题，本文提出了一种动态再收敛堆栈结构，该结构中所有的warp共享一个堆栈池，并可以根据运行时的需要构建不同warp的动态堆栈。这样既能满足堆栈需求，又能避免不必要的硬件资源浪费。实验表明，动态再收敛堆栈在保持传统性能的前提下，可以将堆栈成本降低50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs

GPGPUs usually experience performance degradation when the control flow of threads diverges in a warp. Reconvergence stack based control flow handling scheme is widely adopted in GPU architectures. The depth of such stack is always set to a large number, so that there can be enough entries for warps experiencing nested branches. However, for warps experiencing simple branches or even no branches, those deep reconvergence stacks would stay idle, causing a serious waste of hardware resource. Moreover, with the development of GPU architectures, more and more warps will be deployed on a GPU stream processor core, such problem could be even more serious. To solve this problem, this paper propose a dynamic reconvergence stack structure, in which a stack pool is shared by all the warps, and dynamic stacks of different warps can be constructed according to the run-time requirement. This can satisfy the stack requirement while eliminating unnecessary waste of hardware resource. Our experiments show that the dynamic reconvergence stack can reduce the cost of stack by 50% with the conventional performance well maintained.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量