Origami: Folding Warps for Energy Efficient GPUs

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926281

Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, M. Annavaram

{"title":"Origami: Folding Warps for Energy Efficient GPUs","authors":"Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, M. Annavaram","doi":"10.1145/2925426.2926281","DOIUrl":null,"url":null,"abstract":"Graphical processing units (GPUs) are increasingly used to run a wide range of general purpose applications. Due to wide variation in application parallelism and inherent application level inefficiencies, GPUs experience significant idle periods. In this work, we first show that significant fine-grain pipeline bubbles exist regardless of warp scheduling policies or workloads. We propose to convert these bubbles into energy saving opportunities using Origami. Origami consists of two components: Warp Folding and the Origami scheduler. With Warp Folding, warps are split into two half-warps which are issued in succession. Warp Folding leaves half of the execution lanes idle, which is then exploited to improve energy efficiency through power gating. Origami scheduler is a new warp scheduler that is cognizant of the Warp Folding process and tries to further extend the sleep times of idle execution lanes. By combining the two techniques Origami can save 49% and 46% of the leakage energy in the integer and floating point pipelines, respectively. These savings are better than or at least on-par with Warped-Gates, a prior power gating technique that power gates the entire cluster of execution lanes. But Origami achieves these energy savings without relying on forcing idleness on execution lanes, which leads to performance losses, as has been proposed in Warped-Gates. Hence, Origami is able to achieve these energy savings with virtually no performance overhead.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Graphical processing units (GPUs) are increasingly used to run a wide range of general purpose applications. Due to wide variation in application parallelism and inherent application level inefficiencies, GPUs experience significant idle periods. In this work, we first show that significant fine-grain pipeline bubbles exist regardless of warp scheduling policies or workloads. We propose to convert these bubbles into energy saving opportunities using Origami. Origami consists of two components: Warp Folding and the Origami scheduler. With Warp Folding, warps are split into two half-warps which are issued in succession. Warp Folding leaves half of the execution lanes idle, which is then exploited to improve energy efficiency through power gating. Origami scheduler is a new warp scheduler that is cognizant of the Warp Folding process and tries to further extend the sleep times of idle execution lanes. By combining the two techniques Origami can save 49% and 46% of the leakage energy in the integer and floating point pipelines, respectively. These savings are better than or at least on-par with Warped-Gates, a prior power gating technique that power gates the entire cluster of execution lanes. But Origami achieves these energy savings without relying on forcing idleness on execution lanes, which leads to performance losses, as has been proposed in Warped-Gates. Hence, Origami is able to achieve these energy savings with virtually no performance overhead.

查看原文本刊更多论文

折纸:节能gpu的折叠翘曲

图形处理单元(gpu)越来越多地用于运行广泛的通用应用程序。由于应用程序并行性的广泛差异和固有的应用程序级别的低效率，gpu经历了相当长的空闲期。在这项工作中，我们首先表明，无论翘曲调度策略或工作负载如何，都存在显著的细粒度管道气泡。我们建议用折纸将这些气泡转化为节能的机会。折纸由两个组件组成:翘曲折叠和折纸调度程序。经纱折叠时，经纱被分成两个半经纱，这两个半经纱是连续发出的。翘曲折叠使一半的执行通道闲置，然后利用它通过功率门控来提高能源效率。Origami调度器是一种新的调度器，它认识到warp折叠过程，并试图进一步延长空闲执行通道的睡眠时间。通过结合这两种技术，Origami在整数和浮点管道中分别可以节省49%和46%的泄漏能量。这些节省比warp - gates更好，或者至少与warp - gates相当，后者是一种先前的功率门控技术，可以对整个执行通道集群进行功率门控。但是Origami实现了这些节能，而不依赖于强制执行通道上的空闲，这会导致性能损失，正如在warp - gates中提出的那样。因此，Origami能够在几乎没有性能开销的情况下实现这些节能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量