用于数值应用的自动循环嵌套并行化器

2015 Federated Conference on Computer Science and Information Systems (FedCSIS) Pub Date : 2015-11-09 DOI:10.15439/2015F34

M. Pałkowski, T. Klimek, W. Bielecki

{"title":"用于数值应用的自动循环嵌套并行化器","authors":"M. Pałkowski, T. Klimek, W. Bielecki","doi":"10.15439/2015F34","DOIUrl":null,"url":null,"abstract":"We present the source-to-source TRACO compiler allowing for increasing program locality and parallelizing arbitrarily nested loop sequences in numerical applications. Algorithms for generation of tiled code and extracting synchronization-free slices composed of tiles are presented. Parallelism of arbitrary nested loops is obtained by creating a kernel of computations represented in the OpenMP standard to be executed independently on many CPUs. We consider benchmarks, typical from compute-intensive sequences of algebra operations or numerical computation from industry and engineering. The speed-up of programs generated by TRACO are discussed. Related compilers and techniques are considered. Future work is outlined.","PeriodicalId":276884,"journal":{"name":"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"TRACO: An automatic loop nest parallelizer for numerical applications\",\"authors\":\"M. Pałkowski, T. Klimek, W. Bielecki\",\"doi\":\"10.15439/2015F34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present the source-to-source TRACO compiler allowing for increasing program locality and parallelizing arbitrarily nested loop sequences in numerical applications. Algorithms for generation of tiled code and extracting synchronization-free slices composed of tiles are presented. Parallelism of arbitrary nested loops is obtained by creating a kernel of computations represented in the OpenMP standard to be executed independently on many CPUs. We consider benchmarks, typical from compute-intensive sequences of algebra operations or numerical computation from industry and engineering. The speed-up of programs generated by TRACO are discussed. Related compilers and techniques are considered. Future work is outlined.\",\"PeriodicalId\":276884,\"journal\":{\"name\":\"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15439/2015F34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15439/2015F34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

我们提出了源到源的TRACO编译器，允许在数值应用中增加程序的局部性和并行化任意嵌套循环序列。提出了生成平铺代码和提取由平铺组成的无同步切片的算法。任意嵌套循环的并行性是通过创建一个在OpenMP标准中表示的计算内核来获得的，该内核可以在许多cpu上独立执行。我们考虑基准，典型的计算密集型代数运算序列或工业和工程中的数值计算。讨论了TRACO生成程序的加速问题。考虑了相关的编译器和技术。概述了今后的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TRACO: An automatic loop nest parallelizer for numerical applications

We present the source-to-source TRACO compiler allowing for increasing program locality and parallelizing arbitrarily nested loop sequences in numerical applications. Algorithms for generation of tiled code and extracting synchronization-free slices composed of tiles are presented. Parallelism of arbitrary nested loops is obtained by creating a kernel of computations represented in the OpenMP standard to be executed independently on many CPUs. We consider benchmarks, typical from compute-intensive sequences of algebra operations or numerical computation from industry and engineering. The speed-up of programs generated by TRACO are discussed. Related compilers and techniques are considered. Future work is outlined.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 Federated Conference on Computer Science and Information Systems (FedCSIS)

自引率

0.00%

发文量