{"title":"When SkyPilot meets Kubernetes","authors":"G. Vernik, Ronen I. Kat, O. Cohen, Zongheng Yang","doi":"10.1145/3579370.3594764","DOIUrl":"https://doi.org/10.1145/3579370.3594764","url":null,"abstract":"The Sky vision[3] aims to open a new era in cloud computing. Sky abstracts clouds and dynamically use multiple clouds to optimize workload execution. This enable users to focus on their business logic, rather than interact with multiple clouds, and manually optimize performance and costs.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121277780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Cohen, S. Cohen, D. Naor, D. Waddington, Moshik Hershcovitch
{"title":"Cache Line Deltas Compression","authors":"Daniel Cohen, S. Cohen, D. Naor, D. Waddington, Moshik Hershcovitch","doi":"10.1145/3579370.3594753","DOIUrl":"https://doi.org/10.1145/3579370.3594753","url":null,"abstract":"Synchronization of replicated data and program state is an essential aspect of application fault-tolerance. Current solutions use virtual memory mapping to identify page writes and replicate them at the destination. This approach has limitations because the granularity is restricted to a minimum of 4KiB per page, which may result in more data being replicated. Motivated by the emerging CXL hardware, we expand on the work Waddington, et al. [SoCC 22] by evaluating popular compression algorithms on VM snapshot data at cache line granularity. We measure the compression ratio vs. the compression time and present our conclusions.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124091576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next-Generation Security Entity Linkage: Harnessing the Power of Knowledge Graphs and Large Language","authors":"Daniel Alfasi, T. Shapira, A. Bremler-Barr","doi":"10.1145/3579370.3594759","DOIUrl":"https://doi.org/10.1145/3579370.3594759","url":null,"abstract":"With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE [4] knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI [3]. Additionally, we extend this approach by initializing the embeddings for the relations.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114397437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CCO - Cloud Cost Optimizer","authors":"A. Yehoshua, I. Kolchinsky, A. Schuster","doi":"10.1145/3579370.3594746","DOIUrl":"https://doi.org/10.1145/3579370.3594746","url":null,"abstract":"Cloud computing can be complex, but optimal management of it doesn't have to be. In this paper, we present the design and implementation of a scalable multi-Cloud Cost Optimizer (CCO) that calculates the optimal deployment scheme for a given workload on public or hybrid clouds. The goal of CCO is to reduce monetary costs while taking into account the specifications of the workload, including resource requirements and constraints. By using a combination of meta-heuristics, CCO addresses the combinatorial complexity of the problem and currently supports AWS and Azure. The CCO tool [1], can be accessed through a web UI or API and supports on-demand and spot instances. For broad discussion refer to [2].","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121762819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RAM buffering for performance improvement of sequential write workload","authors":"Svetlana Lazareva, G. Petrunin","doi":"10.1145/3579370.3594762","DOIUrl":"https://doi.org/10.1145/3579370.3594762","url":null,"abstract":"This paper presents on-line algorithm that determines further datapath for incoming requests - should they temporarily stay in RAM buffers for future merge operation or should they be written to disks immediately. With workload analysis in real time, the delay time spent in RAM buffer is a self-tuned parameter. This approach increases sequential write requests latency but sufficiently raises the overall performance of sequential write workloads without the use of expensive non-volatile cache.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131401249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of hybrid storage system based on Open CAS technology, optimized for HPC workload","authors":"Svetlana Lazareva, Ivan Petrov","doi":"10.1145/3579370.3594763","DOIUrl":"https://doi.org/10.1145/3579370.3594763","url":null,"abstract":"HPC runs in a distributed structure with a single shared pool of data. In our case, the distributed structure is Lustre file system [4], and the single shared pool of data is our declustered HDD RAID (denoted as DCR). To increase performance, it is suggested to use Open CAS technology [3] as a cache on RAM/NVDIMM with special parameters, optimized for heavy data-intensive sequential HPC workload and an online-algorithm which reduces the number of RMW operations, by merging sequential requests into one full-stripe one.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129682185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TurboHash: A Hash Table for Key-value Store on Persistent Memory","authors":"Xingsheng Zhao, Chen Zhong, Song Jiang","doi":"10.1145/3579370.3594766","DOIUrl":"https://doi.org/10.1145/3579370.3594766","url":null,"abstract":"Major efforts on the design of persistent hash table on a non-volatile byte-addressable memory focus on efficient support of crash consistency with fence/flush primitives as well on non-disruptive table rehashing operations. When a data entry in a hash bucket cannot be updated with one atomic write, out-of-place update, instead of in-place update, is required to avoid data corruption after a failure. This often causes extra fences/flushes. Meanwhile, when open addressing techniques, such as linear probing, are adopted for high load factor, the scope of search for a key can be large. Excessive use of fence/flush and extended key search paths are two major sources of performance degradation with hash tables in persistent memory. To address the issues, we design a persistent hash table, named TurboHash, for building high-performance key-value store. Turbo-Hash has a number of much desired features all in one design. (1) It supports out-of-place update with a cost equivalent to that of an in-place write to provide lock-free reads. (2) Long-distance linear probing is minimized (only when necessary). (3) It conducts only shard resizing for expansion and avoids expensive directory-level rehashing; And (4) it exploits hardware features for high I/O and computation efficiency, including Intel's Optane DC's performance characteristics and Intel AVX instructions. We have implemented TurboHash on the Optane persistent memory and conducted extensive evaluations. Experiment results show that TurboHash improves state-of-the-arts by 2-8 times in terms of throughput and latency.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133827747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Lei, Decai Pan, Dajiang Liu, Peng Ouyang, Xueliang Du
{"title":"Optimizing Memory Allocation for Multi-Subgraph Mapping on Spatial Accelerators","authors":"Lei Lei, Decai Pan, Dajiang Liu, Peng Ouyang, Xueliang Du","doi":"10.1145/3579370.3594767","DOIUrl":"https://doi.org/10.1145/3579370.3594767","url":null,"abstract":"Spatial accelerators enable the pervasive use of energy-efficient solutions for computation-intensive applications. In the mapping of spatial accelerators, a large kernel is usually partitioned into multiple subgraphs for resource constraints, leading to more memory accesses and access conflicts. To minimize the access conflicts, existing works either neglect the interference of multiple subgraphs or pay little attention to data's life cycle along the execution order. To this end, this paper proposes an optimized memory allocation approach for multi-subgraph mapping on spatial accelerators by constructing an optimization problem using Integer Linear Programming (ILP). The experimental results demonstrate that our work can find conflict-free solutions for most kernels and achieve 1.15× speedup, as compared to the state-of-the-art approach.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133001707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roei Kisous, Amit Golander, Yigal Korman, Tim Gubner, Rune Humborstad, Manyi Lu
{"title":"Near-Memory Processing Offload to Remote (Persistent) Memory","authors":"Roei Kisous, Amit Golander, Yigal Korman, Tim Gubner, Rune Humborstad, Manyi Lu","doi":"10.1145/3579370.3594745","DOIUrl":"https://doi.org/10.1145/3579370.3594745","url":null,"abstract":"Traditional Von Neumann computing architectures are struggling to keep up with the rapidly growing demand for scale, performance, power-efficiency and memory capacity. One promising approach to this challenge is Remote Memory, in which the memory is over RDMA fabric [1]. We enhance the remote memory architecture with Near Memory Processing (NMP), a capability that offloads particular compute tasks from the client to the server side as illustrated in Figure 1. Similar motivation drove IBM to offload object processing to their remote KV storage [2]. NMP offload adds latency and server resource costs, therefore, it should only be used when the offload value is substantial, specifically, to save: network bandwidth (e.g. Filter/Aggregate), round trip time (e.g. tree Lookup) and/or distributed locks (e.g. Append to a shared journal).","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benefits of Encryption at the Storage Client","authors":"Or Ozeri, Danny Harnik, Effi Ofer","doi":"10.1145/3579370.3594758","DOIUrl":"https://doi.org/10.1145/3579370.3594758","url":null,"abstract":"Client side encryption is a setting in which storage I/O is encrypted at the client machine before being sent out to a storage system. This is typically done by adding an encryption layer before the storage client or driver. We identify that in cases where some of the storage functions are performed at the client, it is beneficial to also integrate the encryption into the storage client. We implemented such an encryption layer into Ceph RBD - a popular open source distributed storage system. We explain some the main benefits of this approach: The ability to do layered encryption with different encryption keys per layer, the ability to support more complex storage encryption, and finally we observe that by integrating the encryption with the storage client we managed to achieve a nice performance boost.","PeriodicalId":180024,"journal":{"name":"Proceedings of the 16th ACM International Conference on Systems and Storage","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124366893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}