Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management最新文献

筛选
英文 中文
Verified sequential Malloc/Free 已验证的顺序Malloc/Free
A. Appel, D. Naumann
{"title":"Verified sequential Malloc/Free","authors":"A. Appel, D. Naumann","doi":"10.1145/3381898.3397211","DOIUrl":"https://doi.org/10.1145/3381898.3397211","url":null,"abstract":"We verify the functional correctness of an array-of-bins (segregated free-lists) single-thread malloc/free system with respect to a correctness specification written in separation logic. The memory allocator is written in standard C code compatible with the standard API; the specification is in the Verifiable C program logic, and the proof is done in the Verified Software Toolchain within the Coq proof assistant. Our \"resource-aware\" specification can guarantee when malloc will successfully return a block, unlike the standard Posix specification that allows malloc to return NULL whenever it wants to. We also prove subsumption (refinement): the resource-aware specification implies a resource-oblivious spec.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125835945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Garbage collection using a finite liveness domain 使用有限活动域的垃圾收集
Aman Bansal, Saksham Goel, Preey Shah, A. Sanyal, Prasanna Kumar
{"title":"Garbage collection using a finite liveness domain","authors":"Aman Bansal, Saksham Goel, Preey Shah, A. Sanyal, Prasanna Kumar","doi":"10.1145/3381898.3397208","DOIUrl":"https://doi.org/10.1145/3381898.3397208","url":null,"abstract":"Functional languages manage heap data through garbage collection. Since static analysis of heap data is difficult, garbage collectors conservatively approximate the liveness of heap objects by reachability i.e. every object that is reachable from the root set is considered live. Consequently, a large amount of memory that is reachable but not used further during execution is left uncollected by the collector. Earlier attempts at liveness-based garbage collection for languages supporting structured types were based on analyses that considered arbitrary liveness values, i.e. they assumed that any substructure of the data could be potentially live. This made the analyses complex and unscalable. However, functional programs traverse structured data like lists in identifiable patterns. We identify a set of eight usage patterns that we use as liveness values. The liveness analysis that accompanies our garbage collector is based on this set; liveness arising out of other patterns of traversal are conservatively approximated by this set. This restriction to a small set of liveness values reaps several benefits -- it results in a simple liveness analysis which scales to much larger programs with minimal loss of precision, enables the use of a faster collection technique, and is extendable to higher-order programs. Our experiments with a purely functional subset of Scheme show a reduction in the analysis time by orders of magnitude. In addition, the minimum heap size required to run programs is comparable with a liveness-based collector with unrestricted liveness values, and in situations where memory is limited, the garbage collection time is lower than its reachability counterpart.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125134730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Alligator collector: a latency-optimized garbage collector for functional programming languages 鳄鱼收集器:针对函数式编程语言的延迟优化的垃圾收集器
B. Gamari, Laura Dietz
{"title":"Alligator collector: a latency-optimized garbage collector for functional programming languages","authors":"B. Gamari, Laura Dietz","doi":"10.1145/3381898.3397214","DOIUrl":"https://doi.org/10.1145/3381898.3397214","url":null,"abstract":"Modern hardware and applications require runtime systems that can operate under large-heap and low-latency requirements. For many client/server or interactive applications, reducing average and maximum pause times is more important than maximizing throughput. The GHC Haskell runtime system version 8.10.1 offers a new latency-optimized garbage collector as an alternative to the existing throughput-optimized copying garbage collector. This paper details the latency-optimized GC design, which is a generational collector integrating GHC's existing collector and bump-pointer allocator with a non-moving collector and non-moving heap suggested by Ueno and Ohori. We provide an empirical analysis on the latency/throughput tradeoffs. We augment the established nofib micro benchmark with a response-time focused benchmark that simulates real-world applications such as LRU caches, web search, and key-value stores.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131153920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prefetching in functional languages 函数式语言中的预取
S. Ainsworth, Timothy M. Jones
{"title":"Prefetching in functional languages","authors":"S. Ainsworth, Timothy M. Jones","doi":"10.1145/3381898.3397209","DOIUrl":"https://doi.org/10.1145/3381898.3397209","url":null,"abstract":"Functional programming languages contain a number of runtime and language features, such as garbage collection, indirect memory accesses, linked data structures and immutability, that interact with a processor’s memory system. These conspire to cause a variety of unintuitive memory-performance effects. For example, it is slower to traverse through linked lists and arrays of data that have been sorted than to traverse the same data accessed in the order it was allocated. We seek to understand these issues and mitigate them in a manner consistent with functional languages, taking advantage of the features themselves where possible. For example, immutability and garbage collection force linked lists to be allocated roughly sequentially in memory, even when the data pointed to within each node is not. We add language primitives for software-prefetching to the OCaml language to exploit this, and observe significant performance improvements a variety of micro- and macro-benchmarks, resulting in speedups of up to 2× on the out-of-order superscalar Intel Haswell and Xeon Phi Knights Landing systems, and up to 3× on the in-order Arm Cortex-A53.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121170651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ThinGC: complete isolation with marginal overhead ThinGC:完全隔离,开销很小
A. Yang, Erik Österlund, Jesper Wilhelmsson, Hanna Nyblom, Tobias Wrigstad
{"title":"ThinGC: complete isolation with marginal overhead","authors":"A. Yang, Erik Österlund, Jesper Wilhelmsson, Hanna Nyblom, Tobias Wrigstad","doi":"10.1145/3381898.3397213","DOIUrl":"https://doi.org/10.1145/3381898.3397213","url":null,"abstract":"Previous works on leak-tolerating GC and write-rationing GC show that most reads/writes in an application are concentrated to a small number of objects. This suggests that many applications enjoy a clear and stable clustering of hot (recently read and/or written) and cold (the inverse of hot) objects. These results have been shown in the context of Jikes RVM, for stop-the-world collectors. This paper explores a similar design for a concurrent collector in the context of OpenJDK, plus a separate collector to manage cold objects in their own subheap. We evaluate the design and implementation of ThinGC using algorithms from JGraphT and the DaCapo suite. The results show that ThinGC considers fewer objects cold than previous work, and maintaining separate subheaps of hot and cold objects induces marginal overhead for most benchmarks except one, where large slowdown due to excessive reheats is observed.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133662868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploiting inter- and intra-memory asymmetries for data mapping in hybrid tiered-memories 利用内存间和内存内的不对称来实现混合分层内存中的数据映射
Shihao Song, Anup Das, Nagarajan Kandasamy
{"title":"Exploiting inter- and intra-memory asymmetries for data mapping in hybrid tiered-memories","authors":"Shihao Song, Anup Das, Nagarajan Kandasamy","doi":"10.1145/3381898.3397215","DOIUrl":"https://doi.org/10.1145/3381898.3397215","url":null,"abstract":"Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We show that our hybrid tiered-memory architecture has a tremendous potential to improve performance and reliability, if exploited by an efficient page management policy at the operating system (OS). Modern OSes are already aware of inter-memory asymmetry. They migrate pages between the two memory types during program execution, starting from an initial allocation of the page to a randomly-selected free physical address in the memory. We extend existing OS awareness in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS’s page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117151525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Improving phase change memory performance with data content aware access 通过数据内容感知访问改进相变存储器性能
Shihao Song, Anup Das, O. Mutlu, Nagarajan Kandasamy
{"title":"Improving phase change memory performance with data content aware access","authors":"Shihao Song, Anup Das, O. Mutlu, Nagarajan Kandasamy","doi":"10.1145/3381898.3397210","DOIUrl":"https://doi.org/10.1145/3381898.3397210","url":null,"abstract":"Phase change memory (PCM) is a scalable non-volatile memory technology that has low access latency (like DRAM) and high capacity (like Flash). Writing to PCM incurs significantly higher latency and energy penalties compared to reading its content. A prominent characteristic of PCM’s write operation is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in work- loads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all- zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state- of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON improves the effective access latency by 31%, overall system performance by 27%, and total memory system energy consumption by 43% compared to the best of performance-oriented state-of-the-art techniques.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115612881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Understanding and optimizing persistent memory allocation 理解和优化持久内存分配
Wentao Cai, Haosen Wen, H. A. Beadle, Chris Kjellqvist, Mohammad Hedayati, M. Scott
{"title":"Understanding and optimizing persistent memory allocation","authors":"Wentao Cai, Haosen Wen, H. A. Beadle, Chris Kjellqvist, Mohammad Hedayati, M. Scott","doi":"10.1145/3381898.3397212","DOIUrl":"https://doi.org/10.1145/3381898.3397212","url":null,"abstract":"The proliferation of fast, dense, byte-addressable nonvolatile memory suggests that data might be kept in pointer-rich \"in-memory\" format across program runs and even process and system crashes. For full generality, such data requires dynamic memory allocation, and while the allocator could in principle be \"rolled into\" each data structure, it is desirable to make it a separate abstraction. Toward this end, we introduce recoverability, a correctness criterion for persistent allocators, together with a nonblocking allocator, Ralloc, that satisfies this criterion. Ralloc is based on the LRMalloc of Leite and Rocha, with three key innovations. First, we persist just enough information during normal operation to permit correct reconstruction of the heap after a full-system crash. Our reconstruction mechanism performs garbage collection (GC) to identify and remedy any failure-induced memory leaks. Second, we introduce the notion of filter functions, which identify the locations of pointers within persistent blocks to mitigate the limitations of conservative GC. Third, to allow persistent regions to be mapped at an arbitrary address, we employ position-independent (offset-based) pointers for both data and metadata. Experiments show Ralloc to be performance-competitive with both Makalu, the state-of-the-art lock-based persistent allocator, and such transient allocators as LRMalloc and JEMalloc. In particular, reliance on GC and offline metadata reconstruction allows Ralloc to pay almost nothing for persistence during normal operation.","PeriodicalId":301629,"journal":{"name":"Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132262598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信