Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances

Proceedings of the 2018 International Symposium on Code Generation and Optimization Pub Date : 2018-02-24 DOI:10.1145/3168827

Peng Jiang, G. Agrawal

{"title":"Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances","authors":"Peng Jiang, G. Agrawal","doi":"10.1145/3168827","DOIUrl":null,"url":null,"abstract":"Irregular applications that involve indirect memory accesses were traditionally considered unsuitable for SIMD processing. Though some progress has been made in recent years, the existing approaches require either expensive data reorganization or favorable input distribution to deliver good performance. In this work, we propose a novel vectorization approach called in-vector reduction that can efficiently accelerate a class of associative irregular applications. This approach exploits associativity in the irregular reductions to resolve the data conflicts within SIMD vectors. We implement in-vector reduction with the new conflict detecting instructions that are supported in Intel AVX-512 instruction set and provide a programming interface to facilitate the vectorization of such associative irregular applications. Compared with previous approaches, in-vector reduction eliminates a large part of the overhead of data reorganization and achieves high SIMD utilization even under adverse input distributions. The evaluation results show that our approach is efficient in vectorizing a diverse set of irregular applications, including graph algorithms, particle simulation codes, and hash-based aggregation. Our vectorization achieves 1.5x to 5.5x speedups over the original sequential codes on a single core of Intel Xeon Phi and outperforms a competing approach, conflict-masking based vectorization, by 1.4x to 11.8x.","PeriodicalId":103558,"journal":{"name":"Proceedings of the 2018 International Symposium on Code Generation and Optimization","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Symposium on Code Generation and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3168827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Irregular applications that involve indirect memory accesses were traditionally considered unsuitable for SIMD processing. Though some progress has been made in recent years, the existing approaches require either expensive data reorganization or favorable input distribution to deliver good performance. In this work, we propose a novel vectorization approach called in-vector reduction that can efficiently accelerate a class of associative irregular applications. This approach exploits associativity in the irregular reductions to resolve the data conflicts within SIMD vectors. We implement in-vector reduction with the new conflict detecting instructions that are supported in Intel AVX-512 instruction set and provide a programming interface to facilitate the vectorization of such associative irregular applications. Compared with previous approaches, in-vector reduction eliminates a large part of the overhead of data reorganization and achieves high SIMD utilization even under adverse input distributions. The evaluation results show that our approach is efficient in vectorizing a diverse set of irregular applications, including graph algorithms, particle simulation codes, and hash-based aggregation. Our vectorization achieves 1.5x to 5.5x speedups over the original sequential codes on a single core of Intel Xeon Phi and outperforms a competing approach, conflict-masking based vectorization, by 1.4x to 11.8x.

查看原文本刊更多论文

具有最新SIMD体系结构进展的关联不规则应用程序的无冲突矢量化

涉及间接内存访问的不规则应用程序传统上被认为不适合SIMD处理。尽管近年来取得了一些进展，但现有的方法要么需要昂贵的数据重组，要么需要有利的输入分配来提供良好的性能。在这项工作中，我们提出了一种新的矢量化方法，称为矢量约简，可以有效地加速一类关联不规则应用。该方法利用不规则约简中的结合性来解决SIMD向量内的数据冲突。我们利用英特尔AVX-512指令集支持的新的冲突检测指令实现了向量约简，并提供了一个编程接口来促进这种关联不规则应用程序的向量化。与以前的方法相比，向量约简消除了大部分数据重组的开销，即使在不利的输入分布下也能实现较高的SIMD利用率。评估结果表明，我们的方法在向量化各种不规则应用方面是有效的，包括图算法、粒子模拟代码和基于哈希的聚合。我们的矢量化在Intel Xeon Phi单核上比原始顺序代码实现了1.5到5.5倍的速度提升，并且比基于冲突屏蔽的矢量化方法高出1.4到11.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 International Symposium on Code Generation and Optimization

自引率

0.00%

发文量