Cache-Adaptive Analysis

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2013-07-01 DOI:10.1145/2935764.2935798

M. A. Bender, E. Demaine, Roozbeh Ebrahimi, Jeremy T. Fineman, Rob Johnson, Andrea Lincoln, J. Lynch, Samuel McCauley

{"title":"Cache-Adaptive Analysis","authors":"M. A. Bender, E. Demaine, Roozbeh Ebrahimi, Jeremy T. Fineman, Rob Johnson, Andrea Lincoln, J. Lynch, Samuel McCauley","doi":"10.1145/2935764.2935798","DOIUrl":null,"url":null,"abstract":"Memory efficiency and locality have substantial impact on the performance of programs, particularly when operating on large data sets. Thus, memory- or I/O-efficient algorithms have received significant attention both in theory and practice. The widespread deployment of multicore machines, however, brings new challenges. Specifically, since the memory (RAM) is shared across multiple processes, the effective memory-size allocated to each process fluctuates over time. This paper presents techniques for designing and analyzing algorithms in a cache-adaptive setting, where the RAM available to the algorithm changes over time. These techniques make analyzing algorithms in the cache-adaptive model almost as easy as in the external memory, or DAM model. Our techniques enable us to analyze a wide variety of algorithms --- Master-Method-style algorithms, Akra-Bazzi-style algorithms, collections of mutually recursive algorithms, and algorithms, such as FFT, that break problems of size N into subproblems of size Theta(Nc). We demonstrate the effectiveness of these techniques by deriving several results: 1. We give a simple recipe for determining whether common divide-and-conquer cache-oblivious algorithms are optimally cache adaptive. 2. We show how to bound an algorithm's non-optimality. We give a tight analysis showing that a class of cache-oblivious algorithms is a logarithmic factor worse than optimal. 3. We show the generality of our techniques by analyzing the cache-oblivious FFT algorithm, which is not covered by the above theorems. Nonetheless, the same general techniques can show that it is at most O(loglog N) away from optimal in the cache adaptive setting, and that this bound is tight. These general theorems give concrete results about several algorithms that could not be analyzed using earlier techniques. For example, our results apply to Fast Fourier Transform, matrix multiplication, Jacobi Multipass Filter, and cache-oblivious dynamic-programming algorithms, such as Longest Common Subsequence and Edit Distance. Our results also give algorithm designers clear guidelines for creating optimally cache-adaptive algorithms.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2935764.2935798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Memory efficiency and locality have substantial impact on the performance of programs, particularly when operating on large data sets. Thus, memory- or I/O-efficient algorithms have received significant attention both in theory and practice. The widespread deployment of multicore machines, however, brings new challenges. Specifically, since the memory (RAM) is shared across multiple processes, the effective memory-size allocated to each process fluctuates over time. This paper presents techniques for designing and analyzing algorithms in a cache-adaptive setting, where the RAM available to the algorithm changes over time. These techniques make analyzing algorithms in the cache-adaptive model almost as easy as in the external memory, or DAM model. Our techniques enable us to analyze a wide variety of algorithms --- Master-Method-style algorithms, Akra-Bazzi-style algorithms, collections of mutually recursive algorithms, and algorithms, such as FFT, that break problems of size N into subproblems of size Theta(Nc). We demonstrate the effectiveness of these techniques by deriving several results: 1. We give a simple recipe for determining whether common divide-and-conquer cache-oblivious algorithms are optimally cache adaptive. 2. We show how to bound an algorithm's non-optimality. We give a tight analysis showing that a class of cache-oblivious algorithms is a logarithmic factor worse than optimal. 3. We show the generality of our techniques by analyzing the cache-oblivious FFT algorithm, which is not covered by the above theorems. Nonetheless, the same general techniques can show that it is at most O(loglog N) away from optimal in the cache adaptive setting, and that this bound is tight. These general theorems give concrete results about several algorithms that could not be analyzed using earlier techniques. For example, our results apply to Fast Fourier Transform, matrix multiplication, Jacobi Multipass Filter, and cache-oblivious dynamic-programming algorithms, such as Longest Common Subsequence and Edit Distance. Our results also give algorithm designers clear guidelines for creating optimally cache-adaptive algorithms.

查看原文本刊更多论文

Cache-Adaptive分析

内存效率和局部性对程序的性能有很大的影响，特别是在操作大型数据集时。因此，内存或I/ o效率算法在理论和实践中都受到了极大的关注。然而，多核机器的广泛部署带来了新的挑战。具体来说，由于内存(RAM)是跨多个进程共享的，因此分配给每个进程的有效内存大小会随着时间的推移而波动。本文介绍了在缓存自适应设置中设计和分析算法的技术，其中算法可用的RAM随时间变化。这些技术使得在缓存自适应模型中分析算法几乎与在外部存储器或DAM模型中一样容易。我们的技术使我们能够分析各种各样的算法——master - method风格的算法，akra - bazzi风格的算法，相互递归算法的集合，以及将大小为N的问题分解为大小为Theta(Nc)的子问题的算法，如FFT。我们通过得出几个结果来证明这些技术的有效性:1。我们给出了一个简单的方法来确定共同分治缓存无关算法是否具有最佳的缓存自适应。2. 我们将展示如何约束算法的非最优性。我们给出了一个严密的分析，表明一类无关缓存的算法是一个比最优算法差的对数因子。3.我们通过分析缓存无关的FFT算法来展示我们技术的通用性，该算法没有被上述定理所涵盖。尽管如此，相同的通用技术可以显示，在缓存自适应设置中，它离最优值最多0 (logn)，并且这个界限很紧。这些一般定理给出了一些算法的具体结果，这些算法是用早期的技术无法分析的。例如，我们的结果适用于快速傅里叶变换，矩阵乘法，雅可比多通滤波器和缓存无关的动态规划算法，如最长公共子序列和编辑距离。我们的结果也为算法设计者提供了创建最佳缓存自适应算法的明确指导方针。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures

自引率

0.00%

发文量