Hui Sun , Bo Chen , Jiaming Huang , Qiang Wang , Xiaole Liu , Yi Zhou , Yinliang Yue , Xiao Qin
{"title":"A+Store: An Asynchronous Parallel Compaction for Multi-NDP-Enabled Key–Value Store","authors":"Hui Sun , Bo Chen , Jiaming Huang , Qiang Wang , Xiaole Liu , Yi Zhou , Yinliang Yue , Xiao Qin","doi":"10.1016/j.sysarc.2025.103549","DOIUrl":null,"url":null,"abstract":"<div><div>LSM-tree-based key–value stores face significant I/O bandwidth consumption and performance bottlenecks due to frequent data rewrites and migrations during compaction. To address this issue, near-data processing (NDP) technology has emerged as a promising solution and is gaining increasing attention. NDP reduces the data transfer distance between storage and processing resources by placing computational resources closer to storage devices or integrating them into memory, thereby effectively alleviating performance bottlenecks. However, existing multi-NDP key–value stores still face synchronization problems, leading to long wait times and underutilization of resources. To address these issues, we propose an asynchronous parallel compaction for multi-NDP-enabled key–value store named <strong>A</strong><span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span><strong>Store</strong>. In A<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>Store, to optimize data layout, we implement an MLSM-tree on each NDP device, an asynchronous execution queue for dynamic task management, and an independent metadata management method. This asynchronous mechanism allows each NDP device to update its metadata immediately after completing a compaction task rather than wait for other devices, thereby eliminating synchronization waiting time among NDP devices. Additionally, as each NDP stores SSTables really within specific key ranges; thus, the device can perform sub-compaction tasks in parallel according to its key range, significantly enhancing the execution speed of tasks within each NDP device. This approach can improve the system’s parallel processing capability and resource utilization, addressing the bottlenecks in existing multi-NDP KV stores in applications with the requirements of large-scale data processing and low latency. To evaluate the performance of A<sup>+</sup>Store, we compare A<sup>+</sup>Store against state-of-the-art KV stores, including PStore, MStore, and RocksDB (configured with a RAID architecture). We develop a tested toolkit using the real-world dataset OpenAlex, and study the performance of A<sup>+</sup>Store under realistic workloads. Experimental results show that A<sup>+</sup>Store demonstrates superior performance across all tests. For example, when loading 100 GB of writes, A<sup>+</sup>Store achieves 2.87<span><math><mo>×</mo></math></span> the throughput of PStore and 2<span><math><mo>×</mo></math></span> that of MStore, while reducing write amplification by 65.3% and 24.8% compared to PStore and MStore – NDP-empowered KV stores, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"168 ","pages":"Article 103549"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125002218","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
LSM-tree-based key–value stores face significant I/O bandwidth consumption and performance bottlenecks due to frequent data rewrites and migrations during compaction. To address this issue, near-data processing (NDP) technology has emerged as a promising solution and is gaining increasing attention. NDP reduces the data transfer distance between storage and processing resources by placing computational resources closer to storage devices or integrating them into memory, thereby effectively alleviating performance bottlenecks. However, existing multi-NDP key–value stores still face synchronization problems, leading to long wait times and underutilization of resources. To address these issues, we propose an asynchronous parallel compaction for multi-NDP-enabled key–value store named AStore. In AStore, to optimize data layout, we implement an MLSM-tree on each NDP device, an asynchronous execution queue for dynamic task management, and an independent metadata management method. This asynchronous mechanism allows each NDP device to update its metadata immediately after completing a compaction task rather than wait for other devices, thereby eliminating synchronization waiting time among NDP devices. Additionally, as each NDP stores SSTables really within specific key ranges; thus, the device can perform sub-compaction tasks in parallel according to its key range, significantly enhancing the execution speed of tasks within each NDP device. This approach can improve the system’s parallel processing capability and resource utilization, addressing the bottlenecks in existing multi-NDP KV stores in applications with the requirements of large-scale data processing and low latency. To evaluate the performance of A+Store, we compare A+Store against state-of-the-art KV stores, including PStore, MStore, and RocksDB (configured with a RAID architecture). We develop a tested toolkit using the real-world dataset OpenAlex, and study the performance of A+Store under realistic workloads. Experimental results show that A+Store demonstrates superior performance across all tests. For example, when loading 100 GB of writes, A+Store achieves 2.87 the throughput of PStore and 2 that of MStore, while reducing write amplification by 65.3% and 24.8% compared to PStore and MStore – NDP-empowered KV stores, respectively.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.