Warped-Compression: Enabling power efficient GPUs through register compression

Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, W. Ro, M. Annavaram
{"title":"Warped-Compression: Enabling power efficient GPUs through register compression","authors":"Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, W. Ro, M. Annavaram","doi":"10.1145/2749469.2750417","DOIUrl":null,"url":null,"abstract":"This paper presents Warped-Compression, a warp-level register compression scheme for reducing GPU power consumption. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread registers is small. Removing data redundancy of register values through register compression reduces the effective register width, thereby enabling power reduction opportunities. GPU register files are huge as they are necessary to keep concurrent execution contexts and to enable fast context switching. As a result register file consumes a large fraction of the total GPU chip power. GPU design trends show that the register file size will continue to increase to enable even more thread level parallelism. To reduce register file data redundancy warped-compression uses low-cost and implementationefficient base-delta-immediate (BDI) compression scheme, that takes advantage of banked register file organization used in GPUs. Since threads within a warp write values with strong similarity, BDI can quickly compress and decompress by selecting either a single register, or one of the register banks, as the primary base and then computing delta values of all the other registers, or banks. Warped-compression can be used to reduce both dynamic and leakage power. By compressing register values, each warp-level register access activates fewer register banks, which leads to reduction in dynamic power. When fewer banks are used to store the register content, leakage power can be reduced by power gating the unused banks. Evaluation results show that register compression saves 25% of the total register file power consumption.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"23 1","pages":"502-514"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"101","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 101

Abstract

This paper presents Warped-Compression, a warp-level register compression scheme for reducing GPU power consumption. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread registers is small. Removing data redundancy of register values through register compression reduces the effective register width, thereby enabling power reduction opportunities. GPU register files are huge as they are necessary to keep concurrent execution contexts and to enable fast context switching. As a result register file consumes a large fraction of the total GPU chip power. GPU design trends show that the register file size will continue to increase to enable even more thread level parallelism. To reduce register file data redundancy warped-compression uses low-cost and implementationefficient base-delta-immediate (BDI) compression scheme, that takes advantage of banked register file organization used in GPUs. Since threads within a warp write values with strong similarity, BDI can quickly compress and decompress by selecting either a single register, or one of the register banks, as the primary base and then computing delta values of all the other registers, or banks. Warped-compression can be used to reduce both dynamic and leakage power. By compressing register values, each warp-level register access activates fewer register banks, which leads to reduction in dynamic power. When fewer banks are used to store the register content, leakage power can be reduced by power gating the unused banks. Evaluation results show that register compression saves 25% of the total register file power consumption.
warp - compression:通过寄存器压缩使能高效的gpu
本文提出了一种用于降低GPU功耗的扭曲级寄存器压缩方案——扭曲压缩。这项工作的动机是观察到同一经纱内线程的寄存器值是相似的,即两个连续线程寄存器之间的算术差异很小。通过寄存器压缩消除寄存器值的数据冗余减少了有效的寄存器宽度,从而实现了降低功耗的机会。GPU寄存器文件是巨大的,因为它们是保持并发执行上下文和实现快速上下文切换所必需的。因此,寄存器文件消耗了GPU芯片总功率的很大一部分。GPU设计趋势表明,寄存器文件的大小将继续增加,以实现更多的线程级并行性。为了减少寄存器文件数据冗余,扭曲压缩使用低成本和实现效率的基础-增量-立即(BDI)压缩方案,该方案利用了gpu中使用的银行寄存器文件组织。由于warp中的线程写入值具有很强的相似性,因此BDI可以通过选择单个寄存器或一个寄存器组作为主要基数,然后计算所有其他寄存器或寄存器组的增量值来快速压缩和解压缩。翘曲压缩可以降低动态功率和泄漏功率。通过压缩寄存器值,每次翘曲级寄存器访问激活较少的寄存器组,从而导致动态功率的降低。当用于存储寄存器内容的存储库较少时,可以通过对未使用的存储库进行电源门控来降低泄漏功率。评估结果表明,寄存器压缩节省了总寄存器文件功耗的25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信