多核和多核cpu的模型驱动转换

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI:10.1145/3314221.3314653

Martin Kong, L. Pouchet

{"title":"多核和多核cpu的模型驱动转换","authors":"Martin Kong, L. Pouchet","doi":"10.1145/3314221.3314653","DOIUrl":null,"url":null,"abstract":"Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves auto-tuning. In this work we propose a novel customizable polyhedral scheduling technique, with the aim of delivering high performance for several hardware targets. We design constraints and objectives that model several crucial aspects of performance such as stride optimization or the trade-off between parallelism and reuse, while considering important architectural features of the target machine. We evaluate our work using the PolyBench/C benchmark suite and experimentally validate it against large optimization spaces generated with the Pluto compiler on 3 representative architectures: an IBM Power9, an Intel Xeon Phi and an Intel Core-i9. Our results show we can achieve comparable or superior performance to Pluto on the majority of benchmarks, without implementing tiling in the source code nor using experimental autotuning.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Model-driven transformations for multi- and many-core CPUs\",\"authors\":\"Martin Kong, L. Pouchet\",\"doi\":\"10.1145/3314221.3314653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves auto-tuning. In this work we propose a novel customizable polyhedral scheduling technique, with the aim of delivering high performance for several hardware targets. We design constraints and objectives that model several crucial aspects of performance such as stride optimization or the trade-off between parallelism and reuse, while considering important architectural features of the target machine. We evaluate our work using the PolyBench/C benchmark suite and experimentally validate it against large optimization spaces generated with the Pluto compiler on 3 representative architectures: an IBM Power9, an Intel Xeon Phi and an Intel Core-i9. Our results show we can achieve comparable or superior performance to Pluto on the majority of benchmarks, without implementing tiling in the source code nor using experimental autotuning.\",\"PeriodicalId\":441774,\"journal\":{\"name\":\"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3314221.3314653\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3314221.3314653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

现代多面体编译器擅长积极地优化带有静态控制部分的代码，但是寻找高性能多面体转换(特别是针对不同硬件目标)的实践状态仍然主要涉及自动调优。在这项工作中，我们提出了一种新的可定制的多面体调度技术，旨在为多个硬件目标提供高性能。我们设计约束和目标来模拟性能的几个关键方面，如跨步优化或并行性和重用之间的权衡，同时考虑目标机器的重要架构特性。我们使用PolyBench/C基准测试套件评估我们的工作，并在3种代表性体系结构(IBM Power9, Intel Xeon Phi和Intel Core-i9)上对Pluto编译器生成的大型优化空间进行实验验证。我们的结果表明，我们可以在大多数基准测试中获得与Pluto相当或更好的性能，而无需在源代码中实现平铺或使用实验性自动调优。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model-driven transformations for multi- and many-core CPUs

Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves auto-tuning. In this work we propose a novel customizable polyhedral scheduling technique, with the aim of delivering high performance for several hardware targets. We design constraints and objectives that model several crucial aspects of performance such as stride optimization or the trade-off between parallelism and reuse, while considering important architectural features of the target machine. We evaluate our work using the PolyBench/C benchmark suite and experimentally validate it against large optimization spaces generated with the Pluto compiler on 3 representative architectures: an IBM Power9, an Intel Xeon Phi and an Intel Core-i9. Our results show we can achieve comparable or superior performance to Pluto on the majority of benchmarks, without implementing tiling in the source code nor using experimental autotuning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量