折纸:节能gpu的折叠翘曲

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926281

Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, M. Annavaram

{"title":"折纸:节能gpu的折叠翘曲","authors":"Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, M. Annavaram","doi":"10.1145/2925426.2926281","DOIUrl":null,"url":null,"abstract":"Graphical processing units (GPUs) are increasingly used to run a wide range of general purpose applications. Due to wide variation in application parallelism and inherent application level inefficiencies, GPUs experience significant idle periods. In this work, we first show that significant fine-grain pipeline bubbles exist regardless of warp scheduling policies or workloads. We propose to convert these bubbles into energy saving opportunities using Origami. Origami consists of two components: Warp Folding and the Origami scheduler. With Warp Folding, warps are split into two half-warps which are issued in succession. Warp Folding leaves half of the execution lanes idle, which is then exploited to improve energy efficiency through power gating. Origami scheduler is a new warp scheduler that is cognizant of the Warp Folding process and tries to further extend the sleep times of idle execution lanes. By combining the two techniques Origami can save 49% and 46% of the leakage energy in the integer and floating point pipelines, respectively. These savings are better than or at least on-par with Warped-Gates, a prior power gating technique that power gates the entire cluster of execution lanes. But Origami achieves these energy savings without relying on forcing idleness on execution lanes, which leads to performance losses, as has been proposed in Warped-Gates. Hence, Origami is able to achieve these energy savings with virtually no performance overhead.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Origami: Folding Warps for Energy Efficient GPUs\",\"authors\":\"Mohammad Abdel-Majeed, Daniel Wong, Justin Kuang, M. Annavaram\",\"doi\":\"10.1145/2925426.2926281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphical processing units (GPUs) are increasingly used to run a wide range of general purpose applications. Due to wide variation in application parallelism and inherent application level inefficiencies, GPUs experience significant idle periods. In this work, we first show that significant fine-grain pipeline bubbles exist regardless of warp scheduling policies or workloads. We propose to convert these bubbles into energy saving opportunities using Origami. Origami consists of two components: Warp Folding and the Origami scheduler. With Warp Folding, warps are split into two half-warps which are issued in succession. Warp Folding leaves half of the execution lanes idle, which is then exploited to improve energy efficiency through power gating. Origami scheduler is a new warp scheduler that is cognizant of the Warp Folding process and tries to further extend the sleep times of idle execution lanes. By combining the two techniques Origami can save 49% and 46% of the leakage energy in the integer and floating point pipelines, respectively. These savings are better than or at least on-par with Warped-Gates, a prior power gating technique that power gates the entire cluster of execution lanes. But Origami achieves these energy savings without relying on forcing idleness on execution lanes, which leads to performance losses, as has been proposed in Warped-Gates. Hence, Origami is able to achieve these energy savings with virtually no performance overhead.\",\"PeriodicalId\":422112,\"journal\":{\"name\":\"Proceedings of the 2016 International Conference on Supercomputing\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2925426.2926281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

图形处理单元(gpu)越来越多地用于运行广泛的通用应用程序。由于应用程序并行性的广泛差异和固有的应用程序级别的低效率，gpu经历了相当长的空闲期。在这项工作中，我们首先表明，无论翘曲调度策略或工作负载如何，都存在显著的细粒度管道气泡。我们建议用折纸将这些气泡转化为节能的机会。折纸由两个组件组成:翘曲折叠和折纸调度程序。经纱折叠时，经纱被分成两个半经纱，这两个半经纱是连续发出的。翘曲折叠使一半的执行通道闲置，然后利用它通过功率门控来提高能源效率。Origami调度器是一种新的调度器，它认识到warp折叠过程，并试图进一步延长空闲执行通道的睡眠时间。通过结合这两种技术，Origami在整数和浮点管道中分别可以节省49%和46%的泄漏能量。这些节省比warp - gates更好，或者至少与warp - gates相当，后者是一种先前的功率门控技术，可以对整个执行通道集群进行功率门控。但是Origami实现了这些节能，而不依赖于强制执行通道上的空闲，这会导致性能损失，正如在warp - gates中提出的那样。因此，Origami能够在几乎没有性能开销的情况下实现这些节能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Origami: Folding Warps for Energy Efficient GPUs

Graphical processing units (GPUs) are increasingly used to run a wide range of general purpose applications. Due to wide variation in application parallelism and inherent application level inefficiencies, GPUs experience significant idle periods. In this work, we first show that significant fine-grain pipeline bubbles exist regardless of warp scheduling policies or workloads. We propose to convert these bubbles into energy saving opportunities using Origami. Origami consists of two components: Warp Folding and the Origami scheduler. With Warp Folding, warps are split into two half-warps which are issued in succession. Warp Folding leaves half of the execution lanes idle, which is then exploited to improve energy efficiency through power gating. Origami scheduler is a new warp scheduler that is cognizant of the Warp Folding process and tries to further extend the sleep times of idle execution lanes. By combining the two techniques Origami can save 49% and 46% of the leakage energy in the integer and floating point pipelines, respectively. These savings are better than or at least on-par with Warped-Gates, a prior power gating technique that power gates the entire cluster of execution lanes. But Origami achieves these energy savings without relying on forcing idleness on execution lanes, which leads to performance losses, as has been proposed in Warped-Gates. Hence, Origami is able to achieve these energy savings with virtually no performance overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量