FLOPs作为密集线性代数算法的判别

F. L'opez, L. Karlsson, P. Bientinesi
{"title":"FLOPs作为密集线性代数算法的判别","authors":"F. L'opez, L. Karlsson, P. Bientinesi","doi":"10.1145/3545008.3545072","DOIUrl":null,"url":null,"abstract":"Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing ”anomalies”: problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FLOPs as a Discriminant for Dense Linear Algebra Algorithms\",\"authors\":\"F. L'opez, L. Karlsson, P. Bientinesi\",\"doi\":\"10.1145/3545008.3545072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing ”anomalies”: problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

涉及矩阵和向量的表达式(称为线性代数表达式)通常通过对BLAS和LAPACK等库中提供的高度优化的内核的一系列调用来求值。一个核序列表示一个算法,一般来说,由于结合律、代数恒等式和多核,一个表达式可以通过许多不同的算法求值。这些算法在数学上都是等价的(即,在精确的算术中,它们都计算相同的结果),但在执行时间方面通常有显著差异。当面临决策时,高级语言、库和工具(如Julia、Armadillo和Linnea)通过选择最小化FLOP计数的算法来进行选择。在本文中,我们测试了FLOP计数作为密集线性代数算法的判别法的有效性,分析了“异常”:最快的算法不会执行最少数量的FLOPs的问题实例。为此,我们将重点放在相对简单的表达式上,并分析异常发生的时间和原因。我们发现异常存在,并且倾向于聚集成大的连续区域。一种表达异常罕见,而另一种表达异常丰富。我们得出结论,即使在构建具有高度优化内核的算法时,FLOPs也不是一个足够可靠的判别法。此外,即使在过滤掉内核间缓存效果之后,大多数异常仍然存在。我们推测,将FLOP计数与内核性能模型相结合将显著提高我们选择最优算法的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FLOPs as a Discriminant for Dense Linear Algebra Algorithms
Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing ”anomalies”: problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信