Autoscheduling for sparse tensor algebra with an asymptotic cost model

Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation Pub Date : 2022-06-09 DOI:10.1145/3519939.3523442

Peter Ahrens, Fredrik Kjolstad, Saman P. Amarasinghe

{"title":"Autoscheduling for sparse tensor algebra with an asymptotic cost model","authors":"Peter Ahrens, Fredrik Kjolstad, Saman P. Amarasinghe","doi":"10.1145/3519939.3523442","DOIUrl":null,"url":null,"abstract":"While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse tensors also introduce a choice of compressed storage formats that can have asymptotic effects. Research into sparse tensor compilers has led to simplified languages that express these tradeoffs, but the user is expected to provide a schedule that makes the decisions. This is challenging because schedulers must anticipate the interaction between sparse formats, loop structure, potential sparsity patterns, and the compiler itself. Automating this decision making process stands to finally make sparse tensor compilers accessible to end users. We present, to the best of our knowledge, the first automatic asymptotic scheduler for sparse tensor programs. We provide an approach to abstractly represent the asymptotic cost of schedules and to choose between them. We narrow down the search space to a manageably small Pareto frontier of asymptotically non-dominating kernels. We test our approach by compiling these kernels with the TACO sparse tensor compiler and comparing them with those generated with the default TACO schedules. Our results show that our approach reduces the scheduling space by orders of magnitude and that the generated kernels perform asymptotically better than those generated using the default schedules.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"174 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3519939.3523442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse tensors also introduce a choice of compressed storage formats that can have asymptotic effects. Research into sparse tensor compilers has led to simplified languages that express these tradeoffs, but the user is expected to provide a schedule that makes the decisions. This is challenging because schedulers must anticipate the interaction between sparse formats, loop structure, potential sparsity patterns, and the compiler itself. Automating this decision making process stands to finally make sparse tensor compilers accessible to end users. We present, to the best of our knowledge, the first automatic asymptotic scheduler for sparse tensor programs. We provide an approach to abstractly represent the asymptotic cost of schedules and to choose between them. We narrow down the search space to a manageably small Pareto frontier of asymptotically non-dominating kernels. We test our approach by compiling these kernels with the TACO sparse tensor compiler and comparing them with those generated with the default TACO schedules. Our results show that our approach reduces the scheduling space by orders of magnitude and that the generated kernels perform asymptotically better than those generated using the default schedules.

查看原文本刊更多论文

具有渐近代价模型的稀疏张量代数的自动调度

虽然循环重排序和融合会对密集张量程序的常因子性能产生较大影响，但对稀疏张量程序的影响是渐近的，在实践中往往会导致数量级的性能差异。稀疏张量还引入了一种具有渐近效应的压缩存储格式的选择。对稀疏张量编译器的研究已经导致了表达这些权衡的简化语言，但是期望用户提供一个做出决定的时间表。这很有挑战性，因为调度器必须预测稀疏格式、循环结构、潜在稀疏模式和编译器本身之间的交互。自动化这个决策过程最终使最终用户可以访问稀疏张量编译器。我们提出，据我们所知，稀疏张量程序的第一个自动渐近调度。我们提供了一种抽象表示调度的渐近代价并在它们之间进行选择的方法。我们将搜索空间缩小到一个可管理的小Pareto边界的渐近非支配核。我们通过使用TACO稀疏张量编译器编译这些内核并将它们与默认TACO调度生成的内核进行比较来测试我们的方法。我们的结果表明，我们的方法将调度空间减少了几个数量级，并且生成的内核的性能渐近地优于使用默认调度生成的内核。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

自引率

0.00%

发文量