A Profile Guided Approach to Optimize Branch Divergence While Transforming Applications for GPUs

Proceedings of the 8th India Software Engineering Conference Pub Date : 2015-02-18 DOI:10.1145/2723742.2723760

S. Sarkar, Sayantan Mitra

引用次数: 7

Abstract

GPUs offer a powerful bulk synchronous programming model for exploiting data parallelism; however, branch divergence amongst executing warps can lead to serious performance degradation due to execution serialization. We propose a novel profile guided approach to optimize branch divergence while transforming a serial program to a data-parallel program for GPUs. Our approach is based on the observation that branches inside some data parallel loops although divergent, exhibit repetitive regular patterns of outcomes. By exploiting such patterns, loop iterations can be aligned so that the corresponding iterations traverse the same branch path. These aligned iterations when executed as a warp in a GPU, become convergent. We propose a new metric based on the repetitive pattern characteristics that indicates whether a data-parallel loop is worth restructuring. When tested our approach on the well-known Rodinia benchmark, we found that it is possible to achieve upto 48% performance improvement by loop restructuring suggested by the patterns and our metrics.

查看原文本刊更多论文

图形处理器应用转换时分支发散优化的轮廓引导方法

gpu为开发数据并行性提供了强大的批量同步编程模型;但是，执行warp之间的分支分歧可能会由于执行序列化而导致严重的性能下降。我们提出了一种新的轮廓引导方法来优化分支发散，同时将串行程序转换为gpu的数据并行程序。我们的方法是基于对一些数据并行循环中的分支的观察，尽管分支是发散的，但却表现出重复的规则结果模式。通过利用这样的模式，可以对齐循环迭代，以便相应的迭代遍历相同的分支路径。当这些对齐的迭代在GPU中作为翘曲执行时，会变得收敛。我们提出了一个基于重复模式特征的新度量，表明数据并行循环是否值得重构。当在著名的Rodinia基准测试我们的方法时，我们发现，通过模式和我们的指标建议的循环重组，可以实现高达48%的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 8th India Software Engineering Conference

自引率

0.00%

发文量