A Scalable and Write-Optimized Disaggregated B+-Tree With Adaptive Cache Assistance

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing Pub Date : 2024-08-02 DOI:10.1109/TCC.2024.3437472

Hang An;Fang Wang;Dan Feng;Xiaomin Zou;Zefeng Liu;Jianshun Zhang

{"title":"A Scalable and Write-Optimized Disaggregated B+-Tree With Adaptive Cache Assistance","authors":"Hang An;Fang Wang;Dan Feng;Xiaomin Zou;Zefeng Liu;Jianshun Zhang","doi":"10.1109/TCC.2024.3437472","DOIUrl":null,"url":null,"abstract":"Disaggregated memory (DM) architecture separates CPU and DRAM into computing/memory resource pools and interconnects them with high-speed networks. Storage systems on DM locate data by distributed index. However, existing distributed indexes either suffer from prohibitive synchronization overhead of write operation or sacrifice the performance of read operation, resulting in low throughput, high tail latency, and challenging trade-off. In this paper, we present Marlin+, a scalable and optimized B+-tree on DM. Marlin+ provides atomic granularity synchronization between write operations via three strategies: 1) a concurrent algorithm that is friendly to IDU operations (Insert, Delete, and Update), enabling different clients to concurrently operate on the same leaf node, 2) shared-exclusive leaf node lock, effectively preventing conflicts between index structure modification operation (SMO) and IDU operations, and 3) critical path compression of write to reduce latency of write operation. Moreover, Marlin+ proposes an adaptive remote address cache to accelerate the access of hot data. Compared to the state-of-the-art schemes based on DM, Marlin achieves 2.21× higher throughput and 83.4% lower P99 latency under YCSB hybrid workloads. Compared to Marlin, Marlin+ improves the throughput by up to 1.58× and reduces the P50 latency by up to 50.5% under YCSB read-intensive workloads.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1074-1087"},"PeriodicalIF":5.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10621579/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Disaggregated memory (DM) architecture separates CPU and DRAM into computing/memory resource pools and interconnects them with high-speed networks. Storage systems on DM locate data by distributed index. However, existing distributed indexes either suffer from prohibitive synchronization overhead of write operation or sacrifice the performance of read operation, resulting in low throughput, high tail latency, and challenging trade-off. In this paper, we present Marlin+, a scalable and optimized B+-tree on DM. Marlin+ provides atomic granularity synchronization between write operations via three strategies: 1) a concurrent algorithm that is friendly to IDU operations (Insert, Delete, and Update), enabling different clients to concurrently operate on the same leaf node, 2) shared-exclusive leaf node lock, effectively preventing conflicts between index structure modification operation (SMO) and IDU operations, and 3) critical path compression of write to reduce latency of write operation. Moreover, Marlin+ proposes an adaptive remote address cache to accelerate the access of hot data. Compared to the state-of-the-art schemes based on DM, Marlin achieves 2.21× higher throughput and 83.4% lower P99 latency under YCSB hybrid workloads. Compared to Marlin, Marlin+ improves the throughput by up to 1.58× and reduces the P50 latency by up to 50.5% under YCSB read-intensive workloads.

查看原文本刊更多论文

具有自适应缓存辅助功能的可扩展且写优化的分解 B+ 树

DM （Disaggregated memory）架构将CPU和DRAM划分为计算/内存资源池，并通过高速网络相互连接。DM上的存储系统通过分布式索引对数据进行定位。然而，现有的分布式索引要么受到写操作的同步开销的限制，要么牺牲读操作的性能，从而导致低吞吐量、高尾延迟和具有挑战性的权衡。在本文中，我们提出了Marlin+，一个可扩展和优化的DM上的B+树。Marlin+通过三种策略提供写操作之间的原子粒度同步：1)对IDU操作（Insert, Delete, Update）友好的并发算法，使不同的客户端可以同时在同一个叶节点上操作；2)共享排他的叶节点锁，有效地防止了索引结构修改操作（SMO）与IDU操作的冲突；3)写操作的关键路径压缩，减少了写操作的延迟。此外，Marlin+还提出了一种自适应的远程地址缓存，以加速热数据的访问。与基于DM的最先进方案相比，在YCSB混合工作负载下，Marlin实现了2.21倍的高吞吐量和83.4%的低P99延迟。与Marlin相比，在YCSB读密集型工作负载下，Marlin+的吞吐量提高了1.58倍，P50延迟降低了50.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cloud Computing Computer Science-Software

CiteScore

9.40

自引率

6.20%

发文量

167

期刊介绍： The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.