Are Your Epochs Too Epic? Batch Free Can Be Harmful

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2024-01-20 DOI:10.1145/3627535.3638491

Daewoo Kim, T. Brown, Ajay Singh

{"title":"Are Your Epochs Too Epic? Batch Free Can Be Harmful","authors":"Daewoo Kim, T. Brown, Ajay Singh","doi":"10.1145/3627535.3638491","DOIUrl":null,"url":null,"abstract":"Epoch based memory reclamation (EBR) is one of the most popular techniques for reclaiming memory in lock-free and optimistic locking data structures, due to its ease of use and good performance in practice. However, EBR is known to be sensitive to thread delays, which can result in performance degradation. Moreover, the exact mechanism for this performance degradation is not well understood. This paper illustrates this performance degradation in a popular data structure benchmark, and does a deep dive to uncover its root cause-a subtle interaction between EBR and state of the art memory allocators. In essence, modern allocators attempt to reduce the overhead of freeing by maintaining bounded thread caches of objects for local reuse, actually freeing them (a very high latency operation) only when thread caches become too large. EBR immediately bypasses these mechanisms whenever a particularly large batch of objects is freed, substantially increasing overheads and latencies. Beyond EBR, many memory reclamation algorithms, and data structures, that reclaim objects in large batches suffer similar deleterious interactions with popular allocators. We propose a simple algorithmic fix for such algorithms to amortize the freeing of large object batches over time, and apply this technique to ten existing memory reclamation algorithms, observing performance improvements for nine out of ten, and over 50% improvement for six out of ten in experiments on a high performance lock-free ABtree. We also present an extremely simple token passing variant of EBR and show that, with our fix, it performs 1.5-2.6x faster than the fastest known memory reclamation algorithm, and 1.2-1.5x faster than not reclaiming at all, on a 192 thread four socket Intel system.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"67 2","pages":"30-41"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3627535.3638491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Epoch based memory reclamation (EBR) is one of the most popular techniques for reclaiming memory in lock-free and optimistic locking data structures, due to its ease of use and good performance in practice. However, EBR is known to be sensitive to thread delays, which can result in performance degradation. Moreover, the exact mechanism for this performance degradation is not well understood. This paper illustrates this performance degradation in a popular data structure benchmark, and does a deep dive to uncover its root cause-a subtle interaction between EBR and state of the art memory allocators. In essence, modern allocators attempt to reduce the overhead of freeing by maintaining bounded thread caches of objects for local reuse, actually freeing them (a very high latency operation) only when thread caches become too large. EBR immediately bypasses these mechanisms whenever a particularly large batch of objects is freed, substantially increasing overheads and latencies. Beyond EBR, many memory reclamation algorithms, and data structures, that reclaim objects in large batches suffer similar deleterious interactions with popular allocators. We propose a simple algorithmic fix for such algorithms to amortize the freeing of large object batches over time, and apply this technique to ten existing memory reclamation algorithms, observing performance improvements for nine out of ten, and over 50% improvement for six out of ten in experiments on a high performance lock-free ABtree. We also present an extremely simple token passing variant of EBR and show that, with our fix, it performs 1.5-2.6x faster than the fastest known memory reclamation algorithm, and 1.2-1.5x faster than not reclaiming at all, on a 192 thread four socket Intel system.

查看原文本刊更多论文

您的纪元太史诗了吗？批量免费可能有害

基于纪元的内存回收（EBR）是在无锁和乐观锁数据结构中回收内存的最流行技术之一，因为它易于使用，而且在实践中性能良好。然而，众所周知，EBR 对线程延迟很敏感，可能导致性能下降。此外，这种性能下降的确切机制还不甚明了。本文通过一个流行的数据结构基准来说明这种性能下降，并深入挖掘其根本原因--EBR 与最先进的内存分配器之间的微妙交互。从本质上讲，现代分配器试图通过维护对象的有界线程缓存来降低释放开销，以便本地重用，只有当线程缓存变得过大时，才会实际释放对象（延迟非常高的操作）。每当释放一大批对象时，EBR 会立即绕过这些机制，从而大大增加了开销和延迟。除 EBR 外，许多内存回收算法和数据结构在大批量回收对象时，也会与流行的分配器发生类似的有害交互。我们为这类算法提出了一个简单的算法修复方案，即在一段时间内摊销大对象批次的释放，并将这一技术应用于十种现有的内存回收算法，在高性能无锁 ABtree 上进行的实验中，我们观察到十种算法中有九种的性能得到了改善，其中六种的性能改善幅度超过 50%。我们还介绍了 EBR 的一个极其简单的令牌传递变体，结果表明，在 192 线程、4 个插槽的英特尔系统上，经过我们的修正，它的性能比已知最快的内存回收算法快 1.5-2.6 倍，比完全不回收快 1.2-1.5 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming

自引率

0.00%

发文量