Automated derivation of parametric data movement lower bounds for affine programs

Auguste Olivry, J. Langou, L. Pouchet, P. Sadayappan, F. Rastello
{"title":"Automated derivation of parametric data movement lower bounds for affine programs","authors":"Auguste Olivry, J. Langou, L. Pouchet, P. Sadayappan, F. Rastello","doi":"10.1145/3385412.3385989","DOIUrl":null,"url":null,"abstract":"Researchers and practitioners have for long worked on improving the computational complexity of algorithms, focusing on reducing the number of operations needed to perform a computation. However the hardware trend nowadays clearly shows a higher performance and energy cost for data movements than computations: quality algorithms have to minimize data movements as much as possible. The theoretical operational complexity of an algorithm is a function of the total number of operations that must be executed, regardless of the order in which they will actually be executed. But theoretical data movement (or, I/O) complexity is fundamentally different: one must consider all possible legal schedules of the operations to determine the minimal number of data movements achievable, a major theoretical challenge. I/O complexity has been studied via complex manual proofs, e.g., refined from Ω(n3/√S) for matrix-multiply on a cache size S by Hong & Kung to 2n3/√S by Smith et al. While asymptotic complexity may be sufficient to compare I/O potential between broadly different algorithms, the accuracy of the reasoning depends on the tightness of these I/O lower bounds. Precisely, exposing constants is essential to enable precise comparison between different algorithms: for example the 2n3/√S lower bound allows to demonstrate the optimality of panel-panel tiling for matrix-multiplication. We present the first static analysis to automatically derive non-asymptotic parametric expressions of data movement lower bounds with scaling constants, for arbitrary affine computations. Our approach is fully automatic, assisting algorithm designers to reason about I/O complexity and make educated decisions about algorithmic alternatives.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3385989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Researchers and practitioners have for long worked on improving the computational complexity of algorithms, focusing on reducing the number of operations needed to perform a computation. However the hardware trend nowadays clearly shows a higher performance and energy cost for data movements than computations: quality algorithms have to minimize data movements as much as possible. The theoretical operational complexity of an algorithm is a function of the total number of operations that must be executed, regardless of the order in which they will actually be executed. But theoretical data movement (or, I/O) complexity is fundamentally different: one must consider all possible legal schedules of the operations to determine the minimal number of data movements achievable, a major theoretical challenge. I/O complexity has been studied via complex manual proofs, e.g., refined from Ω(n3/√S) for matrix-multiply on a cache size S by Hong & Kung to 2n3/√S by Smith et al. While asymptotic complexity may be sufficient to compare I/O potential between broadly different algorithms, the accuracy of the reasoning depends on the tightness of these I/O lower bounds. Precisely, exposing constants is essential to enable precise comparison between different algorithms: for example the 2n3/√S lower bound allows to demonstrate the optimality of panel-panel tiling for matrix-multiplication. We present the first static analysis to automatically derive non-asymptotic parametric expressions of data movement lower bounds with scaling constants, for arbitrary affine computations. Our approach is fully automatic, assisting algorithm designers to reason about I/O complexity and make educated decisions about algorithmic alternatives.
仿射程序参数数据移动下界的自动推导
长期以来,研究人员和实践者一直致力于提高算法的计算复杂性,重点是减少执行计算所需的操作数量。然而,如今的硬件趋势清楚地表明,与计算相比,数据移动的性能和能源成本更高:高质量的算法必须尽可能地减少数据移动。算法的理论操作复杂性是必须执行的操作总数的函数,而不管它们实际执行的顺序如何。但理论上的数据移动(或I/O)复杂性是根本不同的:必须考虑所有可能的合法操作时间表,以确定可实现的最小数据移动数量,这是一个主要的理论挑战。I/O复杂度已经通过复杂的手工证明进行了研究,例如,Hong & Kung从Ω(n3/√S)对缓存大小为S的矩阵乘法进行了改进,到Smith等人的2n3/√S。虽然渐近复杂性可能足以比较广泛不同算法之间的I/O潜力,但推理的准确性取决于这些I/O下界的紧密性。准确地说,暴露常数对于实现不同算法之间的精确比较至关重要:例如,2n3/√S下界允许演示面板-面板平铺矩阵乘法的最优性。我们提出了第一个静态分析,以自动导出具有缩放常数的数据移动下界的非渐近参数表达式,用于任意仿射计算。我们的方法是全自动的,帮助算法设计者推断I/O复杂性,并对算法替代方案做出明智的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信