Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2012-06-11 DOI:10.1145/2254064.2254108

Justin Holewinski, R. Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, L. Pouchet, A. Rountev, P. Sadayappan

{"title":"Dynamic trace-based analysis of vectorization potential of applications","authors":"Justin Holewinski, R. Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, L. Pouchet, A. Rountev, P. Sadayappan","doi":"10.1145/2254064.2254108","DOIUrl":null,"url":null,"abstract":"Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical for achieving high performance on emerging and future architectures. A vast majority of existing applications were developed without any attention by their developers towards effective vectorizability of the codes. While developers of production compilers such as GNU gcc, Intel icc, PGI pgcc, and IBM xlc have invested considerable effort and made significant advances in enhancing automatic vectorization capabilities, these compilers still cannot effectively vectorize many existing scientific and engineering codes. It is therefore of considerable interest to analyze existing applications to assess the inherent latent potential for SIMD parallelism, exploitable through further compiler advances and/or via manual code changes. In this paper we develop an approach to infer a program's SIMD parallelization potential by analyzing the dynamic data-dependence graph derived from a sequential execution trace. By considering only the observed run-time data dependences for the trace, and by relaxing the execution order of operations to allow any dependence-preserving reordering, we can detect potential SIMD parallelism that may otherwise be missed by more conservative compile-time analyses. We show that for several benchmarks our tool discovers regions of code within computationally-intensive loops that exhibit high potential for SIMD parallelism but are not vectorized by state-of-the-art compilers. We present several case studies of the use of the tool, both in identifying opportunities to enhance the transformation capabilities of vectorizing compilers, as well as in pointing to code regions to manually modify in order to enable auto-vectorization and performance improvement by existing compilers.","PeriodicalId":308121,"journal":{"name":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2254064.2254108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 62

Abstract

Recent hardware trends with GPUs and the increasing vector lengths of SSE-like ISA extensions for multicore CPUs imply that effective exploitation of SIMD parallelism is critical for achieving high performance on emerging and future architectures. A vast majority of existing applications were developed without any attention by their developers towards effective vectorizability of the codes. While developers of production compilers such as GNU gcc, Intel icc, PGI pgcc, and IBM xlc have invested considerable effort and made significant advances in enhancing automatic vectorization capabilities, these compilers still cannot effectively vectorize many existing scientific and engineering codes. It is therefore of considerable interest to analyze existing applications to assess the inherent latent potential for SIMD parallelism, exploitable through further compiler advances and/or via manual code changes. In this paper we develop an approach to infer a program's SIMD parallelization potential by analyzing the dynamic data-dependence graph derived from a sequential execution trace. By considering only the observed run-time data dependences for the trace, and by relaxing the execution order of operations to allow any dependence-preserving reordering, we can detect potential SIMD parallelism that may otherwise be missed by more conservative compile-time analyses. We show that for several benchmarks our tool discovers regions of code within computationally-intensive loops that exhibit high potential for SIMD parallelism but are not vectorized by state-of-the-art compilers. We present several case studies of the use of the tool, both in identifying opportunities to enhance the transformation capabilities of vectorizing compilers, as well as in pointing to code regions to manually modify in order to enable auto-vectorization and performance improvement by existing compilers.

查看原文本刊更多论文

基于动态轨迹的矢量化潜力分析的应用

最近gpu的硬件趋势和面向多核cpu的类似sse的ISA扩展的向量长度的增加意味着有效利用SIMD并行性对于在新兴和未来的体系结构中实现高性能至关重要。绝大多数现有应用程序的开发人员都没有注意到代码的有效向量化。虽然产品编译器(如GNU gcc、Intel icc、PGI pgcc和IBM xlc)的开发人员已经投入了大量精力，并在增强自动向量化功能方面取得了重大进展，但这些编译器仍然不能有效地向量化许多现有的科学和工程代码。因此，分析现有应用程序以评估SIMD并行性的内在潜在潜力是非常有趣的，可以通过进一步的编译器改进和/或通过手动代码更改加以利用。在本文中，我们开发了一种方法来推断一个程序的SIMD并行化潜力通过分析动态数据依赖图派生自一个顺序执行跟踪。通过只考虑跟踪中观察到的运行时数据依赖关系，并放宽操作的执行顺序以允许任何保持依赖关系的重新排序，我们可以检测潜在的SIMD并行性，否则更保守的编译时分析可能会忽略这些并行性。我们展示了在几个基准测试中，我们的工具发现了计算密集型循环中的代码区域，这些区域显示出SIMD并行性的高潜力，但没有被最先进的编译器矢量化。我们提供了几个使用该工具的案例研究，既确定了增强向量化编译器转换能力的机会，也指出了需要手动修改的代码区域，以便通过现有编译器实现自动向量化和性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量