Chenxing Li, Sidi Mohamed Beillahi, Guang Yang, Ming Wu, Wei Xu, Fan Long
{"title":"LVMT: An Efficient Authenticated Storage for Blockchain","authors":"Chenxing Li, Sidi Mohamed Beillahi, Guang Yang, Ming Wu, Wei Xu, Fan Long","doi":"10.1145/3664818","DOIUrl":"https://doi.org/10.1145/3664818","url":null,"abstract":"<p>Authenticated storage access is the performance bottleneck of a blockchain, because each access can be amplified to potentially <i>O</i>(log <i>n</i>) disk I/O operations in the standard Merkle Patricia Trie (MPT) storage structure. In this paper, we propose a multi-Layer Versioned Multipoint Trie (LVMT), a novel high-performance blockchain storage with significantly reduced I/O amplifications. LVMT uses the authenticated multipoint evaluation tree (AMT) vector commitment protocol to update commitment proofs in constant time. LVMT adopts a multi-layer design to support unlimited key-value pairs and stores version numbers instead of value hashes to avoid costly elliptic curve multiplication operations. In our experiment, LVMT outperforms the MPT in real Ethereum traces, delivering read and write operations six times faster. It also boosts blockchain system execution throughput by up to 2.7 times.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"8 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141059453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Design of Fast Delta Encoding for Delta Compression Based Storage Systems","authors":"Haoliang Tan, Wen Xia, Xiangyu Zou, Cai Deng, Qing Liao, Zhaoquan Gu","doi":"10.1145/3664817","DOIUrl":"https://doi.org/10.1145/3664817","url":null,"abstract":"<p>Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, etc. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this paper, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X ∼ 25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10% ∼ 240%.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"4 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140928573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Memory-Disaggregated Radix Tree","authors":"Xuchuan Luo, Pengfei Zuo, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael Lyu, Yangfan Zhou","doi":"10.1145/3664289","DOIUrl":"https://doi.org/10.1145/3664289","url":null,"abstract":"<p>Disaggregated memory (DM) is an increasingly prevalent architecture with high resource utilization. It separates computing and memory resources into two pools and interconnects them with fast networks. Existing range indexes on DM are based on B+ trees, which suffer from large inherent read and write amplifications. The read and write amplifications rapidly saturate the network bandwidth, resulting in low request throughput and high access latency of B+ trees on DM. </p><p>In this paper, we propose that the radix tree is more suitable for DM than the B+ tree due to smaller read and write amplifications. However, constructing a radix tree on DM is challenging due to the costly lock-based concurrency control, the bounded memory-side IOPS, and the complicated computing-side cache validation. To address these challenges, we design <b>SMART</b>, the first radix tree for disaggregated memory with high performance. Specifically, we leverage 1) a <i>hybrid concurrency control</i> scheme including lock-free internal nodes and fine-grained lock-based leaf nodes to reduce lock overhead, 2) a computing-side <i>read-delegation and write-combining</i> technique to break through the IOPS upper bound by reducing redundant I/Os, and 3) a simple yet effective <i>reverse check</i> mechanism for computing-side cache validation. Experimental results show that SMART achieves 6.1 × higher throughput under typical write-intensive workloads and 2.8 × higher throughput under read-only workloads in YCSB benchmarks, compared with state-of-the-art B+ trees on DM.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"41 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140928577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahao Li, Jingbo Su, Luofan Chen, Cheng Li, Kai Zhang, Liang Yang, Sam Noh, Yinlong Xu
{"title":"Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage Systems","authors":"Jiahao Li, Jingbo Su, Luofan Chen, Cheng Li, Kai Zhang, Liang Yang, Sam Noh, Yinlong Xu","doi":"10.1145/3656477","DOIUrl":"https://doi.org/10.1145/3656477","url":null,"abstract":"<p>Data-intensive applications executing on NVM-based storage systems experience serious bottlenecks when moving data between DRAM and NVM. We advocate for the use of the long-existing but recently neglected on-chip DMA to expedite data movement with three contributions. First, we explore new latency-oriented optimization directions, driven by a comprehensive DMA study, to design a high-performance DMA module, which significantly lowers the I/O size threshold to observe benefits. Second, we propose a new data movement engine, <monospace>Fastmove</monospace>, that coordinates the use of the DMA along with the CPU with DDIO-aware strategies, judicious scheduling and load splitting such that the DMA’s limitations are compensated, and the overall gains are maximized. Finally, with a general kernel-based design, simple APIs, and DAX file system integration, <monospace>Fastmove</monospace> allows applications to transparently exploit the DMA and its new features without code change. We run three data-intensive applications MySQL, GraphWalker, and Filebench atop <monospace>NOVA</monospace>, <monospace>ext4-DAX</monospace>, and <monospace>XFS-DAX</monospace>, with standard benchmarks like TPC-C, and popular graph algorithms like PageRank. Across single- and multi-socket settings, compared to the conventional CPU-only NVM accesses, <monospace>Fastmove</monospace> introduces to TPC-C with MySQL 1.13-2.16 × speedups of peak throughput, reduces the average latency by 17.7-60.8%, and saves 37.1-68.9% CPU usage spent in data movement. It also shortens the execution time of graph algorithms with GraphWalker by 39.7-53.4%, and introduces 1.01-1.48 × throughput speedups for Filebench.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"61 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140881717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunfeng Du, Zihang Lin, Suzhen Wu, Yifei Chen, Jiapeng Wu, Shengzhe Wang, Weichun Wang, Qingfeng Wu, Bo Mao
{"title":"FSDedup: Feature-Aware and Selective Deduplication for Improving Performance of Encrypted Non-Volatile Main Memory","authors":"Chunfeng Du, Zihang Lin, Suzhen Wu, Yifei Chen, Jiapeng Wu, Shengzhe Wang, Weichun Wang, Qingfeng Wu, Bo Mao","doi":"10.1145/3662736","DOIUrl":"https://doi.org/10.1145/3662736","url":null,"abstract":"<p>Enhancing the endurance, performance, and energy efficiency of encrypted Non-Volatile Main Memory (NVMM) can be achieved by minimizing written data through inline deduplication. However, existing approaches applying inline deduplication to encrypted NVMM suffer from substantial performance degradation due to high computing, memory footprint, and index-lookup overhead to generate, store, and query the cryptographic hash (fingerprint). In the preliminary ESD [14], we proposed the Error Correcting Code (ECC) assisted selective deduplication scheme, utilizing the ECC information as a fingerprint to identify similar data effectively and then leveraging the selective deduplication technique to eliminate a large amount of redundant data with high reference counts. In this paper, we proposed FSDedup. Compared with ESD, FSDedup could leverage the prefetch cache to reduce the read overhead during similarity comparison and utilize the cache refresh mechanism to identify further and eliminate more redundant data. Extensive experimental evaluations demonstrate that FSDedup can enhance the performance of the NVMM system further than the ESD. Experimental results show that FSDedup can improve both write and read speed by up to 1.8 ×, enhance Instructions Per Cycle (IPC) by up to 1.5 ×, and reduce energy consumption by up to 2.0 ×, compared to ESD.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"26 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Implementation of Deduplication on F2FS","authors":"Tiangmeng Zhang, Renhui Chen, Zijing Li, Congming Gao, Chengke Wang, Jiwu Shu","doi":"10.1145/3662735","DOIUrl":"https://doi.org/10.1145/3662735","url":null,"abstract":"<p>Data deduplication technology has gained popularity in modern file systems due to its ability to eliminate redundant writes and improve storage space efficiency. In recent years, the flash-friendly file system (F2FS) has been widely adopted in flash memory based storage devices, including smartphones, fast-speed servers and Internet of Things. In this paper, we propose F2DFS (deduplication-based F2FS), which introduces three main design contributions. First, F2DFS integrates inline and offline hybrid deduplication. Inline deduplication eliminates redundant writes and enhances flash device endurance, while offline deduplication mitigates the negative I/O performance impact and saves more storage space. Second, F2DFS follows the file system coupling design principle, effectively leveraging the potentials and benefits of both deduplication and native F2FS. Also, with the aid of this principle, F2DFS achieves high-performance and space-efficient incremental deduplication. Third, F2DFS adopts virtual indexing to mitigate deduplication-induced many-to-one mapping updates during the segment cleaning. We conducted comprehensive experimental comparisons between F2DFS, native F2FS, and other state-of-the-art deduplication schemes, using both synthetic and real-world workloads. For inline deduplication, F2DFS outperforms SmartDedup, Dmdedup, and ZFS, in terms of both I/O bandwidth performance and deduplication rates. And for offline deduplication, compared to SmartDedup, XFS and BtrFS, F2DFS shows higher execution efficiency, lower resource usage and greater storage space savings. Moreover, F2DFS demonstrates more efficient segment cleanings than native F2FS.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"39 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140812433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV Placement","authors":"Giorgos Stilianakis, Giorgos Saloustros, Orestis Chiotakis, Giorgos Xanthakis, Angelos Bilas","doi":"10.1145/3658672","DOIUrl":"https://doi.org/10.1145/3658672","url":null,"abstract":"<p>Key-value (KV) stores based on LSM tree have become a foundational layer in the storage stack of datacenters and cloud services. Current approaches for achieving reliability and availability favor reducing network traffic and send to replicas only new KV pairs. As a result, they perform costly compactions to reorganize data in both the primary and backup nodes, which increases device I/O traffic and CPU overhead, and eventually hurts overall system performance. In this paper we describe <i>Tebis</i>, an efficient LSM-based KV store that reduces I/O amplification and CPU overhead for maintaining the replica index. We use a primary-backup replication scheme that performs compactions only on the primary nodes and sends pre-built indexes to backup nodes, avoiding all compactions in backup nodes. Our approach includes an efficient mechanism to deal with pointer translation across nodes in the pre-built region index. Our results show that <i>Tebis</i>\u0000reduces resource utilization on backup nodes compared to performing full compactions: Throughput is increased by 1.06 − 2.90 ×, CPU efficiency is increased by 1.21 − 2.78 ×, and I/O amplification is reduced by 1.7 − 3.27 ×, while network traffic increases by up to 1.32 − 3.76 ×.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"6 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140588933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaehong Min, Chenxingyu Zhao, Ming Liu, Arvind Krishnamurthy
{"title":"eZNS: Elastic Zoned Namespace for Enhanced Performance Isolation and Device Utilization","authors":"Jaehong Min, Chenxingyu Zhao, Ming Liu, Arvind Krishnamurthy","doi":"10.1145/3653716","DOIUrl":"https://doi.org/10.1145/3653716","url":null,"abstract":"<p>Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost-efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering with co-located zones. Applications either under-provision the zone resources yielding unsatisfied throughput, create over-provisioned zones and incur costs, or experience unexpected I/O latencies. </p><p>We propose eZNS, an elastic-zoned namespace interface that exposes an adaptive zone with predictable characteristics. eZNS comprises two major components: a zone arbiter that manages zone allocation and active resources on the control plane, a hierarchical I/O scheduler with read congestion control, and write admission control on the data plane. Together, eZNS enables the transparent use of a ZNS SSD and closes the gap between application requirements and zone interface properties. Our evaluations over RocksDB demonstrate that eZNS outperforms a static zoned interface by 17.7% and 80.3% in throughput and tail latency, respectively, at most.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"52 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140576562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanhui Zhou, Jian Zhou, Kai Lu, Ling Zhan, Peng Xu, Peng Wu, Shuning Chen, Xian Liu, Jiguang Wan
{"title":"A Contract-Aware and Cost-effective LSM Store for Cloud Storage with Low Latency Spikes","authors":"Yuanhui Zhou, Jian Zhou, Kai Lu, Ling Zhan, Peng Xu, Peng Wu, Shuning Chen, Xian Liu, Jiguang Wan","doi":"10.1145/3643851","DOIUrl":"https://doi.org/10.1145/3643851","url":null,"abstract":"<p>Cloud storage is gaining popularity because features such as pay-as-you-go significantly reduce storage costs. However, the community has not sufficiently explored its contract model and latency characteristics. As LSM-Tree-based key-value stores (LSM stores) become the building block for numerous cloud applications, how cloud storage would impact the performance of key-value accesses is vital. This study reveals the significant latency variances of Amazon Elastic Block Store (EBS) under various I/O pressures, which challenges LSM store read performance on cloud storage. To reduce the corresponding tail latency, we propose Calcspar, a contract-aware LSM store for cloud storage, which efficiently addresses the challenges by regulating the rate of I/O requests to cloud storage and absorbing surplus I/O requests with the data cache. We specifically developed a fluctuation-aware cache to lower the high latency brought on by workload fluctuations. Additionally, we build a congestion-aware IOPS allocator to reduce the impact of LSM store internal operations on read latency. We evaluated Calcspar on EBS with different real-world workloads and compared it to the cutting-edge LSM stores. The results show that Calcspar can significantly reduce tail latency while maintaining regular read and write performance, keeping the 99<sup>th</sup> percentile latency under 550<i>μ</i>s and reducing average latency by 66%. In addition, Calcspar has lower write prices and average latency compared to Cloud NoSQL services offered by cloud vendors.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"36 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Section on USENIX ATC 2023","authors":"Dan Williams, Julia Lawall","doi":"10.1145/3635156","DOIUrl":"https://doi.org/10.1145/3635156","url":null,"abstract":"<p>No abstract available.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"49 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139918299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}