Automated derivation of parametric data movement lower bounds for affine programs

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-11-15 DOI:10.1145/3385412.3385989

Auguste Olivry, J. Langou, L. Pouchet, P. Sadayappan, F. Rastello

{"title":"Automated derivation of parametric data movement lower bounds for affine programs","authors":"Auguste Olivry, J. Langou, L. Pouchet, P. Sadayappan, F. Rastello","doi":"10.1145/3385412.3385989","DOIUrl":null,"url":null,"abstract":"Researchers and practitioners have for long worked on improving the computational complexity of algorithms, focusing on reducing the number of operations needed to perform a computation. However the hardware trend nowadays clearly shows a higher performance and energy cost for data movements than computations: quality algorithms have to minimize data movements as much as possible. The theoretical operational complexity of an algorithm is a function of the total number of operations that must be executed, regardless of the order in which they will actually be executed. But theoretical data movement (or, I/O) complexity is fundamentally different: one must consider all possible legal schedules of the operations to determine the minimal number of data movements achievable, a major theoretical challenge. I/O complexity has been studied via complex manual proofs, e.g., refined from Ω(n3/√S) for matrix-multiply on a cache size S by Hong & Kung to 2n3/√S by Smith et al. While asymptotic complexity may be sufficient to compare I/O potential between broadly different algorithms, the accuracy of the reasoning depends on the tightness of these I/O lower bounds. Precisely, exposing constants is essential to enable precise comparison between different algorithms: for example the 2n3/√S lower bound allows to demonstrate the optimality of panel-panel tiling for matrix-multiplication. We present the first static analysis to automatically derive non-asymptotic parametric expressions of data movement lower bounds with scaling constants, for arbitrary affine computations. Our approach is fully automatic, assisting algorithm designers to reason about I/O complexity and make educated decisions about algorithmic alternatives.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3385989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Researchers and practitioners have for long worked on improving the computational complexity of algorithms, focusing on reducing the number of operations needed to perform a computation. However the hardware trend nowadays clearly shows a higher performance and energy cost for data movements than computations: quality algorithms have to minimize data movements as much as possible. The theoretical operational complexity of an algorithm is a function of the total number of operations that must be executed, regardless of the order in which they will actually be executed. But theoretical data movement (or, I/O) complexity is fundamentally different: one must consider all possible legal schedules of the operations to determine the minimal number of data movements achievable, a major theoretical challenge. I/O complexity has been studied via complex manual proofs, e.g., refined from Ω(n3/√S) for matrix-multiply on a cache size S by Hong & Kung to 2n3/√S by Smith et al. While asymptotic complexity may be sufficient to compare I/O potential between broadly different algorithms, the accuracy of the reasoning depends on the tightness of these I/O lower bounds. Precisely, exposing constants is essential to enable precise comparison between different algorithms: for example the 2n3/√S lower bound allows to demonstrate the optimality of panel-panel tiling for matrix-multiplication. We present the first static analysis to automatically derive non-asymptotic parametric expressions of data movement lower bounds with scaling constants, for arbitrary affine computations. Our approach is fully automatic, assisting algorithm designers to reason about I/O complexity and make educated decisions about algorithmic alternatives.

查看原文本刊更多论文

仿射程序参数数据移动下界的自动推导

长期以来，研究人员和实践者一直致力于提高算法的计算复杂性，重点是减少执行计算所需的操作数量。然而，如今的硬件趋势清楚地表明，与计算相比，数据移动的性能和能源成本更高:高质量的算法必须尽可能地减少数据移动。算法的理论操作复杂性是必须执行的操作总数的函数，而不管它们实际执行的顺序如何。但理论上的数据移动(或I/O)复杂性是根本不同的:必须考虑所有可能的合法操作时间表，以确定可实现的最小数据移动数量，这是一个主要的理论挑战。I/O复杂度已经通过复杂的手工证明进行了研究，例如，Hong & Kung从Ω(n3/√S)对缓存大小为S的矩阵乘法进行了改进，到Smith等人的2n3/√S。虽然渐近复杂性可能足以比较广泛不同算法之间的I/O潜力，但推理的准确性取决于这些I/O下界的紧密性。准确地说，暴露常数对于实现不同算法之间的精确比较至关重要:例如，2n3/√S下界允许演示面板-面板平铺矩阵乘法的最优性。我们提出了第一个静态分析，以自动导出具有缩放常数的数据移动下界的非渐近参数表达式，用于任意仿射计算。我们的方法是全自动的，帮助算法设计者推断I/O复杂性，并对算法替代方案做出明智的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量