ACM Transactions on Storage最新文献_第7页

Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores 日志结构合并键值库中强确定性契约的实现

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-02-24 DOI: 10.1145/3582695

Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung

引用次数: 0

Boosting Cache Performance by Access Time Measurements 通过访问时间测量提高缓存性能

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-02-17 DOI: https://dl.acm.org/doi/10.1145/3572778

Gil Einziger, Omri Himelbrand, Erez Waisbard

引用次数: 0

The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression 用于高效重复数据删除后增量压缩的快速轻量级相似性检测设计

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-02-16 DOI: 10.1145/3584663

Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang

{"title":"The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression","authors":"Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang","doi":"10.1145/3584663","DOIUrl":"https://doi.org/10.1145/3584663","url":null,"abstract":"Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing high computational overhead. Generally, this overhead mainly consists of two parts: ① calculating the rolling hash byte by byte across data chunks and ② applying multiple transforms on all of the calculated rolling hash values. In this article, we propose Odess, a fast and lightweight resemblance detection approach, that greatly reduces the computational overhead for resemblance detection while achieving high detection accuracy and a high compression ratio. Odess first utilizes a novel Subwindow-based Parallel Rolling (SWPR) hash method using Single Instruction Multiple Data [1] (SIMD) to accelerate calculation of rolling hashes (corresponding to the first part of the overhead). Odess then uses a novel Content-Defined Sampling method to generate a much smaller proxy hash set from the whole rolling hash set and quickly applies transforms on this small hash set for resemblance detection (corresponding to the second part of the overhead). Evaluation results show that during the stage of resemblance detection, the Odess approach is ∼31.4× and ∼7.9× faster than the state-of-the-art N-Transform and Finesse (a recent variant of N-Transform [39]), respectively. When considering an end-to-end data reduction storage system, the Odess-based system’s throughput is about 3.20× and 1.41× higher than the N-Transform- and Finesse-based systems’ throughput, respectively, while maintaining the high compression ratio of N-Transform and achieving ∼1.22× higher compression ratio over Finesse.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"109 3","pages":"1 - 30"},"PeriodicalIF":1.7,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41294923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs TriCache：一种用户透明的块缓存，使用内存程序实现高性能的核外处理

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-02-13 DOI: 10.1145/3583139

Guan Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen

{"title":"TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs","authors":"Guan Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen","doi":"10.1145/3583139","DOIUrl":"https://doi.org/10.1145/3583139","url":null,"abstract":"Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task. This study proposes TriCache, a cache mechanism that enables in-memory programs to efficiently process out-of-core datasets without requiring any code rewrite. It provides a virtual memory interface on top of the conventional block interface to simultaneously achieve user transparency and sufficient out-of-core performance. A multi-level block cache design is proposed to address the challenge of per-access address translations required by a memory interface. It can exploit spatial and temporal localities in memory or storage accesses to render storage-to-memory address translation and page-level concurrency control adequately efficient for the virtual memory interface. Our evaluation shows that in-memory systems operating on top of TriCache can outperform Linux OS page cache by more than one order of magnitude, and can deliver performance comparable to or even better than that of corresponding counterparts designed specifically for out-of-core scenarios.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":"1 - 30"},"PeriodicalIF":1.7,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45651206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

ZNSwap: un-Block your Swap ZNSwap：取消阻止您的Swap

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-02-01 DOI: 10.1145/3582434

Shai Bergman, Niklas Cassel, Matias Bjørling, M. Silberstein

引用次数: 4

CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches CacheSack：谷歌数据中心Flash缓存准入优化的理论与经验

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-01-21 DOI: 10.1145/3582014

Tzu-Wei Yang, Seth Pollen, Mustafa Uysal, A. Merchant, H. Wolfmeister, Junaid Khalid

引用次数: 0

KVRangeDB: Range Queries for a Hash-based Key–Value Device KVRangeDB：基于哈希的键值设备的范围查询

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-01-21 DOI: 10.1145/3582013

Mian Qin, Qing Zheng, Jason Lee, B. Settlemyer, Fei Wen, N. Reddy, Paul V. Gratz

{"title":"KVRangeDB: Range Queries for a Hash-based Key–Value Device","authors":"Mian Qin, Qing Zheng, Jason Lee, B. Settlemyer, Fei Wen, N. Reddy, Paul V. Gratz","doi":"10.1145/3582013","DOIUrl":"https://doi.org/10.1145/3582013","url":null,"abstract":"Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices do not natively support range queries that are critical to all three types of applications described above. In this article, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key–value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7× in terms of throughput and reduce CPU usage and external write amplifications by 14.3× and 9.8×, respectively.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 21"},"PeriodicalIF":1.7,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42130474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Localized Validation Accelerates Distributed Transactions on Disaggregated Persistent Memory 本地化验证加速了分解持久内存上的分布式事务

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-01-21 DOI: 10.1145/3582012

Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu

{"title":"Localized Validation Accelerates Distributed Transactions on Disaggregated Persistent Memory","authors":"Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu","doi":"10.1145/3582012","DOIUrl":"https://doi.org/10.1145/3582012","url":null,"abstract":"Persistent memory (PM) disaggregation significantly improves the resource utilization and failure isolation to build a scalable and cost-effective remote memory pool in modern data centers. However, due to offering limited computing power and overlooking the bandwidth and persistence properties of real PMs, existing distributed transaction schemes, which are designed for legacy DRAM-based monolithic servers, fail to efficiently work on the disaggregated PM. In this article, we propose FORD, a Fast One-sided RDMA-based Distributed transaction system for the new disaggregated PM architecture. FORD thoroughly leverages one-sided remote direct memory access to handle transactions for bypassing the remote CPU in the PM pool. To reduce the round trips, FORD batches the read and lock operations into one request to eliminate extra locking and validations for the read-write data. To accelerate the transaction commit, FORD updates all remote replicas in a single round trip with parallel undo logging and data visibility control. Moreover, considering the limited PM bandwidth, FORD enables the backup replicas to be read to alleviate the load on the primary replicas, thus improving the throughput. To efficiently guarantee the remote data persistency in the PM pool, FORD selectively flushes data to the backup replicas to mitigate the network overheads. Nevertheless, the original FORD wastes some validation round trips if the read-only data are not modified by other transactions. Hence, we further propose a localized validation scheme to transfer the validation operations for the read-only data from remote to local as much as possible to reduce the round trips. Experimental results demonstrate that FORD significantly improves the transaction throughput by up to 3× and decreases the latency by up to 87.4% compared with state-of-the-art systems.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":" ","pages":"1 - 35"},"PeriodicalIF":1.7,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49399837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores FlatLSM：用于基于PM的KV存储的写优化LSM树

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-01-21 DOI: 10.1145/3579855

Kewen He, Yujie An, Yijing Luo, Xiaoguang Liu, Gang Wang

{"title":"FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores","authors":"Kewen He, Yujie An, Yijing Luo, Xiaoguang Liu, Gang Wang","doi":"10.1145/3579855","DOIUrl":"https://doi.org/10.1145/3579855","url":null,"abstract":"The Log-Structured Merge Tree (LSM-Tree) is widely used in key-value (KV) stores because of its excwrite performance. But LSM-Tree-based KV stores still have the overhead of write-ahead log and write stall caused by slow L0 flush and L0-L1 compaction. New byte-addressable, persistent memory (PM) devices bring an opportunity to improve the write performance of LSM-Tree. Previous studies on PM-based LSM-Tree have not fully exploited PM’s “dual role” of main memory and external storage. In this article, we analyze two strategies of memtables based on PM and the reasons write stall problems occur in the first place. Inspired by the analysis result, we propose FlatLSM, a specially designed flat LSM-Tree for non-volatile memory based KV stores. First, we propose PMTable with separated index and data. The PM Log utilizes the Buffer Log to store KVs of size less than 256B. Second, to solve the write stall problem, FlatLSM merges the volatile memtables and the persistent L0 into large PMTables, which can reduce the depth of LSM-Tree and concentrate I/O bandwidth on L0-L1 compaction. To mitigate write stall caused by flushing large PMTables to SSD, we propose a parallel flush/compaction algorithm based on KV separation. We implemented FlatLSM based on RocksDB and evaluated its performance on Intel’s latest PM device, the Intel Optane DC PMM with the state-of-the-art PM-based LSM-Tree KV stores, FlatLSM improves the throughput 5.2× on random write workload and 2.55× on YCSB-A.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"19 1","pages":"1 - 26"},"PeriodicalIF":1.7,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49366031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improving Storage Systems Using Machine Learning 利用机器学习改进存储系统

IF 1.7 3区计算机科学

ACM Transactions on Storage Pub Date : 2023-01-19 DOI: https://dl.acm.org/doi/10.1145/3568429

Ibrahim Umit Akgun, Ali Selman Aydin, Andrew Burford, Michael McNeill, Michael Arkhangelskiy, Erez Zadok

{"title":"Improving Storage Systems Using Machine Learning","authors":"Ibrahim Umit Akgun, Ali Selman Aydin, Andrew Burford, Michael McNeill, Michael Arkhangelskiy, Erez Zadok","doi":"https://dl.acm.org/doi/10.1145/3568429","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3568429","url":null,"abstract":"<p>Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this article, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show that KML consumes less than 4 KB of dynamic kernel memory, has a CPU overhead smaller than 0.2%, and yet can learn patterns and improve I/O throughput by as much as 2.3× and 15× for two case studies—even for complex, never-seen-before, concurrently running mixed workloads on different storage devices.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"59 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0