PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-01-12 DOI:10.1109/CGO57630.2024.10444791

Gianpietro Consolaro, Zhen Zhang, Harenome Razanajato, Nelson Lossing, Nassim Tchoulak, Adilla Susungi, Artur Cesar Araujo Alves, Renwei Zhang, Denis Barthou, Corinne Ancourt, Cedric Bastoul

{"title":"PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler","authors":"Gianpietro Consolaro, Zhen Zhang, Harenome Razanajato, Nelson Lossing, Nassim Tchoulak, Adilla Susungi, Artur Cesar Araujo Alves, Renwei Zhang, Denis Barthou, Corinne Ancourt, Cedric Bastoul","doi":"10.1109/CGO57630.2024.10444791","DOIUrl":null,"url":null,"abstract":"Polyhedral techniques have been widely used for automatic code optimization in low-level compilers and higher-level processes. Loop optimization is central to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl and Tensor Scheduler have been proposed, each of them targeting a different architecture, parallelism model, or application scenario. The need for scenario-specific optimization is growing due to the heterogeneity of architectures. One of the most critical cases is represented by NPUs (Neural Processing Units) used for AI, which may require loop optimization with different objectives. Another factor to be considered is the framework or compiler in which polyhedral optimization takes place. Different scenarios, depending on the target architecture, compilation environment, and application domain, may require different kinds of optimization to best exploit the architecture feature set. We introduce a new configurable polyhedral scheduler, PolyTOPS, that can be adjusted to various scenarios with straightforward, high-level configurations. This scheduler allows the creation of diverse scheduling strategies that can be both scenario-specific (like state-of-the-art schedulers) and kernel-specific, breaking the concept of a one-size-fits-all scheduler approach. PolyTOPS has been used with isl and CLooG as code generators and has been integrated in MindSpore AKG deep learning compiler. Experimental results in different scenarios show good performance: a geomean speedup of 7.66x on MindSpore (for the NPU Ascend architecture) hybrid custom operators over isl scheduling, a geomean speedup up to 1.80× on PolyBench on different multicore architectures over Pluto scheduling. Finally, some comparisons with different state-of-the-art tools are presented in the PolyMage scenario.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"24 1","pages":"28-40"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO57630.2024.10444791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Polyhedral techniques have been widely used for automatic code optimization in low-level compilers and higher-level processes. Loop optimization is central to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl and Tensor Scheduler have been proposed, each of them targeting a different architecture, parallelism model, or application scenario. The need for scenario-specific optimization is growing due to the heterogeneity of architectures. One of the most critical cases is represented by NPUs (Neural Processing Units) used for AI, which may require loop optimization with different objectives. Another factor to be considered is the framework or compiler in which polyhedral optimization takes place. Different scenarios, depending on the target architecture, compilation environment, and application domain, may require different kinds of optimization to best exploit the architecture feature set. We introduce a new configurable polyhedral scheduler, PolyTOPS, that can be adjusted to various scenarios with straightforward, high-level configurations. This scheduler allows the creation of diverse scheduling strategies that can be both scenario-specific (like state-of-the-art schedulers) and kernel-specific, breaking the concept of a one-size-fits-all scheduler approach. PolyTOPS has been used with isl and CLooG as code generators and has been integrated in MindSpore AKG deep learning compiler. Experimental results in different scenarios show good performance: a geomean speedup of 7.66x on MindSpore (for the NPU Ascend architecture) hybrid custom operators over isl scheduling, a geomean speedup up to 1.80× on PolyBench on different multicore architectures over Pluto scheduling. Finally, some comparisons with different state-of-the-art tools are presented in the PolyMage scenario.

查看原文本刊更多论文

PolyTOPS：可重新配置的灵活多面体调度程序

多面体技术已被广泛用于底层编译器和高层进程中的自动代码优化。多面体调度程序如 Feautrier、Pluto、isl 和 Tensor Scheduler 等已被提出，它们各自针对不同的架构、并行模型或应用场景。由于架构的异质性，针对特定场景的优化需求日益增长。用于人工智能的 NPU（神经处理单元）就是最关键的案例之一，它可能需要根据不同的目标进行循环优化。另一个需要考虑的因素是进行多面体优化的框架或编译器。根据目标架构、编译环境和应用领域的不同，不同的场景可能需要不同类型的优化，以充分利用架构的功能集。我们引入了一种新的可配置多面体调度程序 PolyTOPS，它可以通过直接的高级配置来适应各种场景。这种调度器允许创建多样化的调度策略，既可以针对特定场景（如最先进的调度器），也可以针对特定内核，从而打破了 "一刀切 "调度器方法的概念。PolyTOPS 已与 isl 和 CLooG 一起用作代码生成器，并已集成到 MindSpore AKG 深度学习编译器中。不同场景下的实验结果都显示了良好的性能：在 MindSpore（NPU Ascend 架构）上，混合定制算子的地均速度是 isl 调度的 7.66 倍；在 PolyBench 上，不同多核架构的地均速度是 Pluto 调度的 1.80 倍。最后，还介绍了在 PolyMage 情景中与不同先进工具进行的一些比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

自引率

0.00%

发文量