μC-States:细粒度GPU数据路径电源管理

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI:10.1145/2967938.2967941

Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, M. Kandemir, G. Loh, O. Mutlu, C. Das

{"title":"μC-States:细粒度GPU数据路径电源管理","authors":"Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, M. Kandemir, G. Loh, O. Mutlu, C. Das","doi":"10.1145/2967938.2967941","DOIUrl":null,"url":null,"abstract":"To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power- and clock-gating mechanism for the entire datapath, called μC-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that μC-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.","PeriodicalId":407717,"journal":{"name":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","volume":"91 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"μC-States: Fine-grained GPU datapath power management\",\"authors\":\"Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, M. Kandemir, G. Loh, O. Mutlu, C. Das\",\"doi\":\"10.1145/2967938.2967941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power- and clock-gating mechanism for the entire datapath, called μC-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that μC-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.\",\"PeriodicalId\":407717,\"journal\":{\"name\":\"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)\",\"volume\":\"91 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2967938.2967941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2967938.2967941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

摘要

为了提高图形处理单元(GPU)的性能，而不仅仅是增加核心数量，架构师最近采用了一种按比例扩展的方法:GPU核心的峰值吞吐量和单个功能正在迅速增加。gpu的大核趋势带来了各种挑战，包括更高的静态功耗和大核数据路径组件的低利用率和不平衡利用率。正如我们在本文中所展示的，两个关键问题会出现:(1)由于应用程序并不总是利用大核心数据路径的所有部分，因此数据路径利用率较低且不平衡可能会浪费电源;(2)由于每个大核心生成的更多内存请求导致更高的内存系统争用，因此在某些情况下，使用大核心可能导致应用程序性能下降。本文介绍了一种基于排队理论的大核gpu数据路径组件利用率分析方法。在此分析的基础上，我们为整个数据路径引入了一种细粒度的动态功率和时钟门控机制，称为μC-States，其目的是通过关闭或调降不是运行应用程序性能瓶颈的数据路径组件来最小化功耗。我们的实验评估表明，μC-States显著降低了大核GPU的静态和动态功耗，同时也显著提高了受高内存系统争用影响的应用程序的性能。我们还表明，我们对数据路径组件利用率的分析可以指导包含异构内核的GPU架构中的调度和设计决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

μC-States: Fine-grained GPU datapath power management

To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power- and clock-gating mechanism for the entire datapath, called μC-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that μC-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

自引率

0.00%

发文量