Cache-oblivious algorithms

Matteo Frigo, C. Leiserson, H. Prokop, S. Ramachandran
{"title":"Cache-oblivious algorithms","authors":"Matteo Frigo, C. Leiserson, H. Prokop, S. Ramachandran","doi":"10.1109/SFFCS.1999.814600","DOIUrl":null,"url":null,"abstract":"This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z=/spl Omega/(L/sup 2/) the number of cache misses for an m/spl times/n matrix transpose is /spl Theta/(1+mn/L). The number of cache misses for either an n-point FFT or the sorting of n numbers is /spl Theta/(1+(n/L)(1+log/sub Z/n)). We also give an /spl Theta/(mnp)-work algorithm to multiply an m/spl times/n matrix by an n/spl times/p matrix that incurs /spl Theta/(1+(mn+np+mp)/L+mnp/L/spl radic/Z) cache faults. We introduce an \"ideal-cache\" model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model. Can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice.","PeriodicalId":385047,"journal":{"name":"40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)","volume":"80 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1106","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SFFCS.1999.814600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1106

Abstract

This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z=/spl Omega/(L/sup 2/) the number of cache misses for an m/spl times/n matrix transpose is /spl Theta/(1+mn/L). The number of cache misses for either an n-point FFT or the sorting of n numbers is /spl Theta/(1+(n/L)(1+log/sub Z/n)). We also give an /spl Theta/(mnp)-work algorithm to multiply an m/spl times/n matrix by an n/spl times/p matrix that incurs /spl Theta/(1+(mn+np+mp)/L+mnp/L/spl radic/Z) cache faults. We introduce an "ideal-cache" model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model. Can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice.
缓存无关算法
本文提出了矩形矩阵转置、FFT和排序的渐近最优算法。与以前的最优算法不同,这些算法是缓存无关的:不需要调优依赖于硬件参数的变量(如缓存大小和缓存行长度)来实现最优性。然而,这些算法使用了最优的工作量,并在多个缓存级别之间以最佳方式移动数据。对于大小为Z和缓存线长度为L的缓存,其中Z=/spl ω /(L/sup 2/),则m/spl乘以/n矩阵的转置的缓存失败次数为/spl Theta/(1+mn/L)。对于n点FFT或n个数字的排序,缓存丢失的次数是/spl Theta/(1+(n/L)(1+log/sub Z/n))。我们还给出了一个/spl Theta/(mnp)-work算法,用于将m/spl乘以/n矩阵乘以n/spl乘以/p矩阵,该矩阵会导致/spl Theta/(1+(mn+np+mp)/L+mnp/L/spl径向/Z)缓存故障。我们引入一个“理想缓存”模型来分析我们的算法。我们证明了为两层内存设计的最优缓存无关算法对于多层内存也是最优的,并且证明了理想缓存模型中最优替换的假设。通过更换LRU,可以有效地进行仿真。我们还在实践中提供了关于缓存无关算法有效性的初步实证结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信