用于砖分解的模板SIMD代码生成

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI:10.1145/3178487.3178537

Tuowen Zhao, Mary W. Hall, P. Basu, Samuel Williams, H. Johansen

{"title":"用于砖分解的模板SIMD代码生成","authors":"Tuowen Zhao, Mary W. Hall, P. Basu, Samuel Williams, H. Johansen","doi":"10.1145/3178487.3178537","DOIUrl":null,"url":null,"abstract":"We present a stencil library and associated compiler code generation framework designed to maximize performance on higher-order stencil computations through the use of two main technologies: a fine-grained brick data layout designed to exploit the inherent multidimensional spatial locality endemic to stencil computations, and a vector scatter associative reordering transformation that reduces vector loads and alignment operations and exposes opportunities for the backend compiler to reduce computation. For a range of stencil computations, we compare the generated code expressed in the brick library to the standard tiled code. We attain up to a 7.2X speedup on the most complex stencils when running on an Intel Knights Landing (Xeon Phi) processor.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"SIMD code generation for stencils on brick decompositions\",\"authors\":\"Tuowen Zhao, Mary W. Hall, P. Basu, Samuel Williams, H. Johansen\",\"doi\":\"10.1145/3178487.3178537\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a stencil library and associated compiler code generation framework designed to maximize performance on higher-order stencil computations through the use of two main technologies: a fine-grained brick data layout designed to exploit the inherent multidimensional spatial locality endemic to stencil computations, and a vector scatter associative reordering transformation that reduces vector loads and alignment operations and exposes opportunities for the backend compiler to reduce computation. For a range of stencil computations, we compare the generated code expressed in the brick library to the standard tiled code. We attain up to a 7.2X speedup on the most complex stencils when running on an Intel Knights Landing (Xeon Phi) processor.\",\"PeriodicalId\":193776,\"journal\":{\"name\":\"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3178487.3178537\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3178487.3178537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

我们提出了一个模板库和相关的编译器代码生成框架，旨在通过使用两种主要技术来最大化高阶模板计算的性能:一种细粒度的砖块数据布局，旨在利用模板计算特有的固有多维空间局部性;一种矢量分散关联重排序转换，减少了矢量负载和对齐操作，并为后端编译器提供了减少计算的机会。对于一系列模板计算，我们将砖库中表示的生成代码与标准平铺代码进行比较。当在Intel Knights Landing (Xeon Phi)处理器上运行时，我们在最复杂的模板上获得了高达7.2倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SIMD code generation for stencils on brick decompositions

We present a stencil library and associated compiler code generation framework designed to maximize performance on higher-order stencil computations through the use of two main technologies: a fine-grained brick data layout designed to exploit the inherent multidimensional spatial locality endemic to stencil computations, and a vector scatter associative reordering transformation that reduces vector loads and alignment operations and exposes opportunities for the backend compiler to reduce computation. For a range of stencil computations, we compare the generated code expressed in the brick library to the standard tiled code. We attain up to a 7.2X speedup on the most complex stencils when running on an Intel Knights Landing (Xeon Phi) processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量