Efficient approximations for cache-conscious data placement

Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation Pub Date : 2022-06-09 DOI:10.1145/3519939.3523436

A. Ahmadi, Majid Daliri, A. K. Goharshady, Andreas Pavlogiannis

{"title":"Efficient approximations for cache-conscious data placement","authors":"A. Ahmadi, Majid Daliri, A. K. Goharshady, Andreas Pavlogiannis","doi":"10.1145/3519939.3523436","DOIUrl":null,"url":null,"abstract":"There is a huge and growing gap between the speed of accesses to data stored in main memory vs cache. Thus, cache misses account for a significant portion of runtime overhead in virtually every program and minimizing them has been an active research topic for decades. The primary and most classical formal model for this problem is that of Cache-conscious Data Placement (CDP): given a commutative cache with constant capacity k and a sequence Σ of accesses to data elements, the goal is to map each data element to a cache line such that the total number of cache misses over Σ is minimized. Note that we are considering an offline single-threaded setting in which Σ is known a priori. CDP has been widely studied since the 1990s. In POPL 2002, Petrank and Rawitz proved a notoriously strong hardness result: They showed that for every k ≥ 3, CDP is not only NP-hard but also hard-to-approximate within any non-trivial factor unless P=NP. As such, all subsequent works gave up on theoretical improvements and instead focused on heuristic algorithms with no theoretical guarantees. In this work, we present the first-ever positive theoretical result for CDP. The fundamental idea behind our approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees. Specifically, the access graphs corresponding to many real-world access sequences are sparse and tree-like. This was already well-known in the community but has only been used to design heuristics without guarantees. In contrast, we provide fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є, assuming that the access graph of a specific degree dє is sparse, i.e. sparser real-world instances lead to tighter approximations. Our theoretical results are accompanied by an experimental evaluation in which our approach outperforms past heuristics over small caches with a handful of lines. However, the approach cannot currently handle large real-world caches and making it scalable in practice is a direction for future work.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3519939.3523436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

There is a huge and growing gap between the speed of accesses to data stored in main memory vs cache. Thus, cache misses account for a significant portion of runtime overhead in virtually every program and minimizing them has been an active research topic for decades. The primary and most classical formal model for this problem is that of Cache-conscious Data Placement (CDP): given a commutative cache with constant capacity k and a sequence Σ of accesses to data elements, the goal is to map each data element to a cache line such that the total number of cache misses over Σ is minimized. Note that we are considering an offline single-threaded setting in which Σ is known a priori. CDP has been widely studied since the 1990s. In POPL 2002, Petrank and Rawitz proved a notoriously strong hardness result: They showed that for every k ≥ 3, CDP is not only NP-hard but also hard-to-approximate within any non-trivial factor unless P=NP. As such, all subsequent works gave up on theoretical improvements and instead focused on heuristic algorithms with no theoretical guarantees. In this work, we present the first-ever positive theoretical result for CDP. The fundamental idea behind our approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees. Specifically, the access graphs corresponding to many real-world access sequences are sparse and tree-like. This was already well-known in the community but has only been used to design heuristics without guarantees. In contrast, we provide fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є, assuming that the access graph of a specific degree dє is sparse, i.e. sparser real-world instances lead to tighter approximations. Our theoretical results are accompanied by an experimental evaluation in which our approach outperforms past heuristics over small caches with a handful of lines. However, the approach cannot currently handle large real-world caches and making it scalable in practice is a direction for future work.

查看原文本刊更多论文

对缓存敏感的数据放置的有效近似

访问存储在主存和缓存中的数据的速度之间存在巨大且不断扩大的差距。因此，几乎每个程序的运行时开销中都有很大一部分是缓存缺失造成的，几十年来最小化缓存缺失一直是一个活跃的研究课题。这个问题的主要和最经典的正式模型是缓存感知数据放置(CDP):给定一个容量k恒定的交换缓存和对数据元素的访问序列Σ，目标是将每个数据元素映射到一条缓存线上，从而使Σ上的缓存缺失总数最小化。注意，我们考虑的是脱机单线程设置，其中Σ是先验已知的。自20世纪90年代以来，CDP得到了广泛的研究。在POPL 2002中，Petrank和Rawitz证明了一个众所周知的强硬度结果:他们表明，对于每一个k≥3,CDP不仅是NP困难的，而且在任何非平凡因子内都难以近似，除非P=NP。因此，所有后续的工作都放弃了理论上的改进，而是专注于没有理论保证的启发式算法。在这项工作中，我们提出了CDP的第一个积极的理论结果。我们的方法背后的基本思想是，问题的现实世界实例具有特定的结构属性，可以利用这些属性来获得具有强近似保证的有效算法。具体来说，与许多真实世界的访问序列相对应的访问图是稀疏的和树状的。这在社区中是众所周知的，但只用于设计没有保证的启发式。相比之下，我们提供了固定参数可处理的算法，可以证明在任何因子1 + k中近似缓存丢失的最佳数量，假设特定程度k的访问图是稀疏的，即更稀疏的现实世界实例导致更紧密的近似。我们的理论结果伴随着实验评估，其中我们的方法优于过去的启发式方法，使用少量行的小缓存。然而，该方法目前无法处理现实世界中的大型缓存，使其在实践中可扩展是未来工作的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

自引率

0.00%

发文量