Workload-Aware Power Gating Design and Run-Time Management for Massively Parallel GPGPUs

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI:10.1109/ISVLSI.2016.60

K. Dev, S. Reda, Indrani Paul, Wei Huang, W. Burleson

{"title":"Workload-Aware Power Gating Design and Run-Time Management for Massively Parallel GPGPUs","authors":"K. Dev, S. Reda, Indrani Paul, Wei Huang, W. Burleson","doi":"10.1109/ISVLSI.2016.60","DOIUrl":null,"url":null,"abstract":"Power gating (PG) is an effective power efficiency improvement technique. Future general-purpose graphics processing units (GPGPUs) will likely feature hundreds of compute units (CUs) and be power constrained, which leads to serious challenges to existing PG methodologies. In this paper, we propose novel design-time and run-time techniques to effectively implement power gating in future GPGPUs. Based on industrial models/measurement facilities, we show that designers must consider run-time parallelism within potential applications while implementing power gating designs to avoid incurring unnecessary design overheads. By scaling measurements from a real 28nm GPGPU to a hypothetical future 10nm node, we show that a PG granularity of 16 CU/cluster achieves 99% peak run-time performance without the excessive 53% design-time area overhead of per-CU PG. We also demonstrate that a run-time power management algorithm that is aware of the PG granularity leads to up to 18% additional performance through frequency-boosting under thermal-design power (TDP) constraints.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2016.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Power gating (PG) is an effective power efficiency improvement technique. Future general-purpose graphics processing units (GPGPUs) will likely feature hundreds of compute units (CUs) and be power constrained, which leads to serious challenges to existing PG methodologies. In this paper, we propose novel design-time and run-time techniques to effectively implement power gating in future GPGPUs. Based on industrial models/measurement facilities, we show that designers must consider run-time parallelism within potential applications while implementing power gating designs to avoid incurring unnecessary design overheads. By scaling measurements from a real 28nm GPGPU to a hypothetical future 10nm node, we show that a PG granularity of 16 CU/cluster achieves 99% peak run-time performance without the excessive 53% design-time area overhead of per-CU PG. We also demonstrate that a run-time power management algorithm that is aware of the PG granularity leads to up to 18% additional performance through frequency-boosting under thermal-design power (TDP) constraints.

查看原文本刊更多论文

大规模并行gpgpu的工作负载感知功率门控设计和运行时管理

功率门控(PG)是一种有效的提高功率效率的技术。未来的通用图形处理单元(gpgpu)可能具有数百个计算单元(cu)，并且受到功率限制，这将对现有的PG方法带来严重挑战。在本文中，我们提出了新的设计时和运行时技术，以有效地在未来的gpgpu中实现功率门控。基于工业模型/测量设施，我们表明设计人员在实施电源门控设计时必须考虑潜在应用中的运行时并行性，以避免产生不必要的设计开销。通过将真实的28nm GPGPU扩展到假设的未来10nm节点，我们表明，16 CU/集群的PG粒度可以实现99%的峰值运行时性能，而不会增加每个CU PG的53%的设计面积开销。我们还证明，在热设计功率(TDP)约束下，意识到PG粒度的运行时功率管理算法可以通过频率提升带来高达18%的额外性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量