D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage

IF 1.5 3区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Chen Ding, Jian Zhou, Kai Lu, Sicen Li, Yiqin Xiong, Jiguang Wan, Ling Zhan
{"title":"D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage","authors":"Chen Ding, Jian Zhou, Kai Lu, Sicen Li, Yiqin Xiong, Jiguang Wan, Ling Zhan","doi":"10.1145/3656584","DOIUrl":null,"url":null,"abstract":"<p>LSM-based key-value stores suffer from sub-optimal performance due to their slow and heavy background compactions. The compaction brings severe CPU and network overhead on high-speed disaggregated storage. This paper further reveals that data-intensive compression in compaction consumes a significant portion of CPU power. Moreover, the multi-threaded compactions cause substantial CPU contention and network traffic during high-load periods. Based on the above observations, we propose fine-grained dynamical compaction offloading by leveraging the modern Data Processing Unit (DPU) to alleviate the CPU and network overhead. To achieve this, we first customized a file system to enable efficient data access for DPU. We then leverage the Arm cores on the DPU to meet the burst CPU and network requirements to reduce resource contention and data movement. We further employ dedicated hardware-based accelerators on the DPU to speed up the compression in compactions. We integrate our DPU-offloaded compaction with RocksDB and evaluate it with NVIDIA’s latest Bluefield-2 DPU on a real system. The evaluation shows that the DPU is an effective solution to solve the CPU bottleneck and reduce data traffic of compaction. The results show that compaction performance is accelerated by 2.86 to 4.03 times, system write and read throughput is improved by up to 3.2 times and 1.4 times respectively, and host CPU contention and network traffic are effectively reduced compared to the fine-tuned CPU-only baseline.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"16 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656584","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

LSM-based key-value stores suffer from sub-optimal performance due to their slow and heavy background compactions. The compaction brings severe CPU and network overhead on high-speed disaggregated storage. This paper further reveals that data-intensive compression in compaction consumes a significant portion of CPU power. Moreover, the multi-threaded compactions cause substantial CPU contention and network traffic during high-load periods. Based on the above observations, we propose fine-grained dynamical compaction offloading by leveraging the modern Data Processing Unit (DPU) to alleviate the CPU and network overhead. To achieve this, we first customized a file system to enable efficient data access for DPU. We then leverage the Arm cores on the DPU to meet the burst CPU and network requirements to reduce resource contention and data movement. We further employ dedicated hardware-based accelerators on the DPU to speed up the compression in compactions. We integrate our DPU-offloaded compaction with RocksDB and evaluate it with NVIDIA’s latest Bluefield-2 DPU on a real system. The evaluation shows that the DPU is an effective solution to solve the CPU bottleneck and reduce data traffic of compaction. The results show that compaction performance is accelerated by 2.86 to 4.03 times, system write and read throughput is improved by up to 3.2 times and 1.4 times respectively, and host CPU contention and network traffic are effectively reduced compared to the fine-tuned CPU-only baseline.

D2Comp:在分解存储上使用数据处理单元高效卸载 LSM 树压缩
由于后台压缩速度慢、工作量大,基于 LSM 的键值存储无法达到最佳性能。压缩给高速分解存储带来了严重的 CPU 和网络开销。本文进一步揭示了压缩过程中的数据密集型压缩会消耗大量 CPU 功耗。此外,多线程压缩会在高负载期间造成大量 CPU 竞争和网络流量。基于上述观察结果,我们提出了利用现代数据处理单元(DPU)进行细粒度动态压缩卸载的建议,以减轻 CPU 和网络开销。为此,我们首先定制了一个文件系统,使 DPU 能够高效访问数据。然后,我们利用 DPU 上的 Arm 内核来满足 CPU 和网络的突发需求,从而减少资源争用和数据移动。我们进一步在 DPU 上使用基于硬件的专用加速器,以加快压缩过程中的压缩速度。我们在 RocksDB 中集成了 DPU 负载压缩技术,并在实际系统中使用英伟达最新的 Bluefield-2 DPU 对其进行了评估。评估结果表明,DPU 是解决 CPU 瓶颈和减少压缩数据流量的有效解决方案。结果表明,与仅微调 CPU 的基线相比,压缩性能加快了 2.86 至 4.03 倍,系统写入和读取吞吐量分别提高了 3.2 倍和 1.4 倍,主机 CPU 竞争和网络流量也得到了有效降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization 工程技术-计算机:理论方法
CiteScore
3.60
自引率
6.20%
发文量
78
审稿时长
6-12 weeks
期刊介绍: ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信