图形处理器应用转换时分支发散优化的轮廓引导方法

Proceedings of the 8th India Software Engineering Conference Pub Date : 2015-02-18 DOI:10.1145/2723742.2723760

S. Sarkar, Sayantan Mitra

{"title":"图形处理器应用转换时分支发散优化的轮廓引导方法","authors":"S. Sarkar, Sayantan Mitra","doi":"10.1145/2723742.2723760","DOIUrl":null,"url":null,"abstract":"GPUs offer a powerful bulk synchronous programming model for exploiting data parallelism; however, branch divergence amongst executing warps can lead to serious performance degradation due to execution serialization. We propose a novel profile guided approach to optimize branch divergence while transforming a serial program to a data-parallel program for GPUs. Our approach is based on the observation that branches inside some data parallel loops although divergent, exhibit repetitive regular patterns of outcomes. By exploiting such patterns, loop iterations can be aligned so that the corresponding iterations traverse the same branch path. These aligned iterations when executed as a warp in a GPU, become convergent. We propose a new metric based on the repetitive pattern characteristics that indicates whether a data-parallel loop is worth restructuring. When tested our approach on the well-known Rodinia benchmark, we found that it is possible to achieve upto 48% performance improvement by loop restructuring suggested by the patterns and our metrics.","PeriodicalId":288030,"journal":{"name":"Proceedings of the 8th India Software Engineering Conference","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A Profile Guided Approach to Optimize Branch Divergence While Transforming Applications for GPUs\",\"authors\":\"S. Sarkar, Sayantan Mitra\",\"doi\":\"10.1145/2723742.2723760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPUs offer a powerful bulk synchronous programming model for exploiting data parallelism; however, branch divergence amongst executing warps can lead to serious performance degradation due to execution serialization. We propose a novel profile guided approach to optimize branch divergence while transforming a serial program to a data-parallel program for GPUs. Our approach is based on the observation that branches inside some data parallel loops although divergent, exhibit repetitive regular patterns of outcomes. By exploiting such patterns, loop iterations can be aligned so that the corresponding iterations traverse the same branch path. These aligned iterations when executed as a warp in a GPU, become convergent. We propose a new metric based on the repetitive pattern characteristics that indicates whether a data-parallel loop is worth restructuring. When tested our approach on the well-known Rodinia benchmark, we found that it is possible to achieve upto 48% performance improvement by loop restructuring suggested by the patterns and our metrics.\",\"PeriodicalId\":288030,\"journal\":{\"name\":\"Proceedings of the 8th India Software Engineering Conference\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th India Software Engineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2723742.2723760\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th India Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723742.2723760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

gpu为开发数据并行性提供了强大的批量同步编程模型;但是，执行warp之间的分支分歧可能会由于执行序列化而导致严重的性能下降。我们提出了一种新的轮廓引导方法来优化分支发散，同时将串行程序转换为gpu的数据并行程序。我们的方法是基于对一些数据并行循环中的分支的观察，尽管分支是发散的，但却表现出重复的规则结果模式。通过利用这样的模式，可以对齐循环迭代，以便相应的迭代遍历相同的分支路径。当这些对齐的迭代在GPU中作为翘曲执行时，会变得收敛。我们提出了一个基于重复模式特征的新度量，表明数据并行循环是否值得重构。当在著名的Rodinia基准测试我们的方法时，我们发现，通过模式和我们的指标建议的循环重组，可以实现高达48%的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Profile Guided Approach to Optimize Branch Divergence While Transforming Applications for GPUs

GPUs offer a powerful bulk synchronous programming model for exploiting data parallelism; however, branch divergence amongst executing warps can lead to serious performance degradation due to execution serialization. We propose a novel profile guided approach to optimize branch divergence while transforming a serial program to a data-parallel program for GPUs. Our approach is based on the observation that branches inside some data parallel loops although divergent, exhibit repetitive regular patterns of outcomes. By exploiting such patterns, loop iterations can be aligned so that the corresponding iterations traverse the same branch path. These aligned iterations when executed as a warp in a GPU, become convergent. We propose a new metric based on the repetitive pattern characteristics that indicates whether a data-parallel loop is worth restructuring. When tested our approach on the well-known Rodinia benchmark, we found that it is possible to achieve upto 48% performance improvement by loop restructuring suggested by the patterns and our metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th India Software Engineering Conference

自引率

0.00%

发文量