Shiqiang Nie , Chi Zhang , Menghan Li , Fangxing Yu , Yaming Li , Weiguo Wu
{"title":"ZoomDB: Building cost-effective key–value store engine on ZNS SSD and SMR HDD","authors":"Shiqiang Nie , Chi Zhang , Menghan Li , Fangxing Yu , Yaming Li , Weiguo Wu","doi":"10.1016/j.sysarc.2025.103465","DOIUrl":null,"url":null,"abstract":"<div><div>Log-Structured Merge tree (LSM-tree) based key–Value (KV) stores have become critical components in managing data for write-intensive cloud applications. With the explosive growth of unstructured data, emerging host-managed zoned storage solutions, such as high-performance Zoned NameSpace Solid State Drive (ZNS SSD) and large-capacity Shingled Magnetic Recording Hard Disk Drive (SMR HDD), present an ideal opportunity for efficient data storage. However, The state-of-the-art scheme partitions the LSM-tree on hybrid storage, placing lower levels on high-performance devices and higher levels on large-capacity devices, but it fails to address challenges in data layout and garbage collection on the hybrid storage system equipped with ZNS SSD and SMR HDD.</div><div>In this paper, we propose ZoomDB, an LSM-tree KV store engine designed around KV separation and tailored for hybrid zoned storage devices. First, we integrate KV separation with zone management in LSM-tree-based hybrid storage. Specifically, keys and low-level values are placed in high-performance zones on ZNS SSDs, while high-level values are stored in large-capacity zones on SMR HDDs, optimizing both performance and storage efficiency. To further enhance data management, we introduce a hotness identification mechanism that classifies values based on access frequency, storing hot and cold values in separate zones. Finally, we propose diversity GC tailored to zones with varying access frequencies, effectively reducing data migration overhead. We implement and evaluate ZoomDB on real ZNS SSD and SMR HDD. The evaluation results demonstrate that ZoomDB reduces the number of GC-triggered writes by 77.5% on average compared to WiscKey. It achieves throughput gains of 1.79<span><math><mo>×</mo></math></span> , 3.13<span><math><mo>×</mo></math></span> , 4.01<span><math><mo>×</mo></math></span> , 4.25<span><math><mo>×</mo></math></span> , and 4.32<span><math><mo>×</mo></math></span> over WiscKey+, WiscKey, GearDB, ZoneKV, and LevelDB, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"167 ","pages":"Article 103465"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001377","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Log-Structured Merge tree (LSM-tree) based key–Value (KV) stores have become critical components in managing data for write-intensive cloud applications. With the explosive growth of unstructured data, emerging host-managed zoned storage solutions, such as high-performance Zoned NameSpace Solid State Drive (ZNS SSD) and large-capacity Shingled Magnetic Recording Hard Disk Drive (SMR HDD), present an ideal opportunity for efficient data storage. However, The state-of-the-art scheme partitions the LSM-tree on hybrid storage, placing lower levels on high-performance devices and higher levels on large-capacity devices, but it fails to address challenges in data layout and garbage collection on the hybrid storage system equipped with ZNS SSD and SMR HDD.
In this paper, we propose ZoomDB, an LSM-tree KV store engine designed around KV separation and tailored for hybrid zoned storage devices. First, we integrate KV separation with zone management in LSM-tree-based hybrid storage. Specifically, keys and low-level values are placed in high-performance zones on ZNS SSDs, while high-level values are stored in large-capacity zones on SMR HDDs, optimizing both performance and storage efficiency. To further enhance data management, we introduce a hotness identification mechanism that classifies values based on access frequency, storing hot and cold values in separate zones. Finally, we propose diversity GC tailored to zones with varying access frequencies, effectively reducing data migration overhead. We implement and evaluate ZoomDB on real ZNS SSD and SMR HDD. The evaluation results demonstrate that ZoomDB reduces the number of GC-triggered writes by 77.5% on average compared to WiscKey. It achieves throughput gains of 1.79 , 3.13 , 4.01 , 4.25 , and 4.32 over WiscKey+, WiscKey, GearDB, ZoneKV, and LevelDB, respectively.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.