部分控制流线性化

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2018-06-11 DOI:10.1145/3192366.3192413

Simon Moll, Sebastian Hack

{"title":"部分控制流线性化","authors":"Simon Moll, Sebastian Hack","doi":"10.1145/3192366.3192413","DOIUrl":null,"url":null,"abstract":"If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a SIMD program, several targets of a branch might be executed because of divergence. Especially for irregular data-parallel workloads, it is crucial to avoid if-converting non-divergent branches to increase SIMD utilization. In this paper, we present partial linearization, a simple and efficient if-conversion algorithm that overcomes several limitations of existing if-conversion techniques. In contrast to prior work, it has provable guarantees on which non-divergent branches are retained and will never duplicate code or insert additional branches. We show how our algorithm can be used in a classic loop vectorizer as well as to implement data-parallel languages such as ISPC or OpenCL. Furthermore, we implement prior vectorizer optimizations on top of partial linearization in a more general way. We evaluate the implementation of our algorithm in LLVM on a range of irregular data analytics kernels, a neutronics simulation benchmark and NAB, a molecular dynamics benchmark from SPEC2017 on AVX2, AVX512, and ARM Advanced SIMD machines and report speedups of up to 146 % over ICC, GCC and Clang O3.","PeriodicalId":20583,"journal":{"name":"Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Partial control-flow linearization\",\"authors\":\"Simon Moll, Sebastian Hack\",\"doi\":\"10.1145/3192366.3192413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a SIMD program, several targets of a branch might be executed because of divergence. Especially for irregular data-parallel workloads, it is crucial to avoid if-converting non-divergent branches to increase SIMD utilization. In this paper, we present partial linearization, a simple and efficient if-conversion algorithm that overcomes several limitations of existing if-conversion techniques. In contrast to prior work, it has provable guarantees on which non-divergent branches are retained and will never duplicate code or insert additional branches. We show how our algorithm can be used in a classic loop vectorizer as well as to implement data-parallel languages such as ISPC or OpenCL. Furthermore, we implement prior vectorizer optimizations on top of partial linearization in a more general way. We evaluate the implementation of our algorithm in LLVM on a range of irregular data analytics kernels, a neutronics simulation benchmark and NAB, a molecular dynamics benchmark from SPEC2017 on AVX2, AVX512, and ARM Advanced SIMD machines and report speedups of up to 146 % over ICC, GCC and Clang O3.\",\"PeriodicalId\":20583,\"journal\":{\"name\":\"Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3192366.3192413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3192366.3192413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

if转换是向量化的基本技术。它解释了这样一个事实，即在SIMD程序中，分支的几个目标可能因为分歧而被执行。特别是对于不规则的数据并行工作负载，避免if转换非发散分支以增加SIMD利用率是至关重要的。在本文中，我们提出了部分线性化，一种简单而有效的中频转换算法，克服了现有中频转换技术的几个局限性。与之前的工作相比，它具有可证明的保证，可以保留非发散的分支，并且永远不会复制代码或插入额外的分支。我们展示了如何在经典的循环矢量器中使用我们的算法，以及如何实现数据并行语言，如ISPC或OpenCL。此外，我们以更一般的方式在部分线性化的基础上实现了先验矢量器优化。我们在一系列不规则数据分析内核上评估了我们的算法在LLVM中的实现，其中包括一个中子模拟基准和NAB(来自AVX2, AVX512和ARM Advanced SIMD机器上的SPEC2017的分子动力学基准)，并报告了比ICC, GCC和Clang O3的速度高达146%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Partial control-flow linearization

If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a SIMD program, several targets of a branch might be executed because of divergence. Especially for irregular data-parallel workloads, it is crucial to avoid if-converting non-divergent branches to increase SIMD utilization. In this paper, we present partial linearization, a simple and efficient if-conversion algorithm that overcomes several limitations of existing if-conversion techniques. In contrast to prior work, it has provable guarantees on which non-divergent branches are retained and will never duplicate code or insert additional branches. We show how our algorithm can be used in a classic loop vectorizer as well as to implement data-parallel languages such as ISPC or OpenCL. Furthermore, we implement prior vectorizer optimizations on top of partial linearization in a more general way. We evaluate the implementation of our algorithm in LLVM on a range of irregular data analytics kernels, a neutronics simulation benchmark and NAB, a molecular dynamics benchmark from SPEC2017 on AVX2, AVX512, and ARM Advanced SIMD machines and report speedups of up to 146 % over ICC, GCC and Clang O3.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量