Proceedings of the Second International Symposium on Memory Systems最新文献

筛选
英文 中文
Checkpointing Exascale Memory Systems with Existing Memory Technologies 基于现有内存技术的百亿亿级内存系统检查点
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989121
Nilmini Abeyratne, H. Chen, Byoungchan Oh, R. Dreslinski, C. Chakrabarti, T. Mudge
{"title":"Checkpointing Exascale Memory Systems with Existing Memory Technologies","authors":"Nilmini Abeyratne, H. Chen, Byoungchan Oh, R. Dreslinski, C. Chakrabarti, T. Mudge","doi":"10.1145/2989081.2989121","DOIUrl":"https://doi.org/10.1145/2989081.2989121","url":null,"abstract":"Building exascale supercomputers requires resilience to failing components such as processor, memory, storage, and network devices. Checkpoint/restart is a key ingredient in attaining resilience, but providing fast and reliable checkpointing is becoming more challenging as the amount of data to checkpoint and the number of components that can fail increase in exascale systems. To improve the speed of checkpointing, emerging non-volatile memory (phase change, magnetic, resistive RAM) have been proposed. However, using unproven memories to create checkpoints will only increase the design risk for exascale memory systems. In this paper, we show that exascale systems with hundreds of petabytes of memory can be constructed with commodity DRAM and SSD flash memory and that newer non-volatile memory are unnecessary, at least for the next generation. The challenge when using commodity parts is providing fast and reliable checkpointing to protect against system failures. A straightforward solution of checkpointing to local flash-based SSD devices will not work because they are endurance and performance limited. We present a checkpointing solution that employs a combination of DRAM and SSD devices. A Checkpoint Location Controller (CLC) is implemented to monitor the endurance of the SSD and the performance loss of the application and to decide dynamically whether to checkpoint to the DRAM or the SSD. The CLC improves both SSD endurance and application slowdown; but the checkpoints in DRAM are exposed to device failures. To design a reliable exascale memory, we protect the data with a low latency ECC that can correct all errors due to bit/pin/column/word faults and also detect errors due to chip failures, and we protect the checkpoint with a Chipkill-Correct level ECC that allows reliable checkpointing to the DRAM. Using our system, the SSD lifetime increases by 2x---from 3 years to 6.3 years. Furthermore, the CLC reduces the average checkpointing overhead by nearly 10x (47% from a 420% slowdown), compared to when the application always checkpointed to the SSD.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127546422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Case for Associative DRAM Caches 关联DRAM缓存的案例
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989120
Paul Tschirhart, Jim Stevens, Zeshan A. Chishti, B. Jacob
{"title":"The Case for Associative DRAM Caches","authors":"Paul Tschirhart, Jim Stevens, Zeshan A. Chishti, B. Jacob","doi":"10.1145/2989081.2989120","DOIUrl":"https://doi.org/10.1145/2989081.2989120","url":null,"abstract":"In-package DRAM caches are a promising new development that may enable the continued scaling of main memory by facilitating the creation of multi-level memory systems that can effectively utilize dense non-volatile memory technologies. However, determining an appropriate storage scheme for the large amount of meta-data needed by these new caches has proven to be difficult. As a result, prior work has suggested that associativity, with its additional metadata requirements, may not be well suited for use in large in-package DRAM caches. This work makes the case that despite these problems, associativity is still a desirable feature for DRAM caches by demonstrating the benefits of associativity for a wide range of cache configurations and workloads.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114263714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Validation of DRAM RAPL Power Measurements DRAM RAPL功率测量的验证
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989088
Spencer Desrochers, Chad Paradis, Vincent M. Weaver
{"title":"A Validation of DRAM RAPL Power Measurements","authors":"Spencer Desrochers, Chad Paradis, Vincent M. Weaver","doi":"10.1145/2989081.2989088","DOIUrl":"https://doi.org/10.1145/2989081.2989088","url":null,"abstract":"Recent Intel processors support the Running Average Power Level (RAPL) interface, which among other things provides estimated energy measurements for the CPUs, integrated GPU, and DRAM. These measurements are easily accessible by the user, and can be gathered by a wide variety of tools, including the Linux perf_event interface. This allows unprecedented easy access to energy information when designing and optimizing energy-aware code. While greatly useful, on most systems these RAPL measurements are estimated values, generated on the fly by an on-chip energy model. The values are not documented well, and the results (especially the DRAM results) have undergone only limited validation. We validate the DRAM RAPL results on both desktop and server Haswell machines, with multiple types of DDR3 and DDR4 memory. We instrument the hardware to gather actual power measurements and compare them to the RAPL values returned via Linux perf_event. We describe the many challenges encountered when instrumenting systems for detailed power measurement. We find that the RAPL results match overall energy and power trends, usually by a constant power offset. The results match best when the DRAM is being heavily utilized, but do not match as well in cases where the system is idle, or when an integrated GPU is using the memory. We also verify that Haswell server machines produce more accurate results, as they include actual power measurements gathered through the integrated voltage regulator.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133859107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC 慢速主存对性能的影响:高性能计算中STT-MRAM的案例研究
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989082
Kazi Asifuzzaman, Milan Pavlović, M. Radulovic, D. Zaragoza, Oh-Jeong Kwon, K. Ryoo, Petar Radojkovic
{"title":"Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC","authors":"Kazi Asifuzzaman, Milan Pavlović, M. Radulovic, D. Zaragoza, Oh-Jeong Kwon, K. Ryoo, Petar Radojkovic","doi":"10.1145/2989081.2989082","DOIUrl":"https://doi.org/10.1145/2989081.2989082","url":null,"abstract":"In high-performance computing (HPC), significant effort is invested in research and development of novel memory technologies. One of them is Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) --- byte-addressable, high-endurance non-volatile memory with slightly higher access time than DRAM. In this study, we conduct a preliminary assessment of HPC system performance impact with STT-MRAM main memory with recent industry estimations. Reliable timing parameters of STT-MRAM devices are unavailable, so we also perform a sensitivity analysis that correlates overall system slowdown trend with respect to average device latency. Our results demonstrate that the overall system performance of large HPC clusters is not particularly sensitive to main-memory latency. Therefore, STT-MRAM, as well as any other emerging non-volatile memories with comparable density and access time, can be a viable option for future HPC memory system design.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134461721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
CLARA: Circular Linked-List Auto and Self Refresh Architecture 循环链表自动和自我刷新架构
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989084
Aditya Agrawal, Mike O'Connor, Evgeny Bolotin, Niladrish Chatterjee, J. Emer, S. Keckler
{"title":"CLARA: Circular Linked-List Auto and Self Refresh Architecture","authors":"Aditya Agrawal, Mike O'Connor, Evgeny Bolotin, Niladrish Chatterjee, J. Emer, S. Keckler","doi":"10.1145/2989081.2989084","DOIUrl":"https://doi.org/10.1145/2989081.2989084","url":null,"abstract":"With increasing DRAM densities, the performance and energy overheads of refresh operations are increasingly significant. When the system is active, refresh commands render DRAM banks unavailable for increasing periods of time. These refresh operations can interfere with regular memory operations and hurt performance. In addition, when the system is idle, DRAM self-refresh is the dominant source of energy consumption, and it directly impacts battery life and standby time. Prior refresh reduction techniques seek to reduce active-mode auto-refresh energy, reduce self-refresh energy, improve performance, or some combination thereof. In this paper, we present CLARA, a circular linked-list based refresh architecture which meets all three goals with very low overheads and without sacrificing DRAM capacity. This approach exploits the variation in retention time at a chip granularity as opposed to a DIMM-wide, rank granularity in prior work. CLARA reduces auto- and self-refresh by 86.2%, independent of workload. Auto refresh reduction improves average CPU performance by 3.1% and 6.5% in the normal and extended temperature range, respectively. GPU performance improves by 2.1% on average in the extended temperature range. DRAM idle power during self-refresh is reduced by 44%. The area overhead of CLARA in the DRAM is about 0.085% and negligible in the memory controller.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134583701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
DRAMScale: Mechanisms to Increase DRAM Capacity DRAMScale:增加DRAM容量的机制
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989109
Krishna T. Malladi, Uksong Kang, M. Awasthi, Hongzhong Zheng
{"title":"DRAMScale: Mechanisms to Increase DRAM Capacity","authors":"Krishna T. Malladi, Uksong Kang, M. Awasthi, Hongzhong Zheng","doi":"10.1145/2989081.2989109","DOIUrl":"https://doi.org/10.1145/2989081.2989109","url":null,"abstract":"New resistive memory technologies promise scalability and non-volatility but suffer from longer, asymmetric read-write latencies and lower endurance, placing the burden of system design on architects. In order to avoid such pitfalls and still provision for exascale data requirements using a much faster DRAM technology, we introduce DRAMScale. It features three novel mechanisms to increase DRAM density while complementing technology scaling and creating a new capacity-optimized DRAM system. Such optimizations enable us to build a two-tier memory system that meets memory latency and capacity requirements.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132437246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing Consistency Issues in HMC Atomics 分析HMC原子中的一致性问题
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989104
Pranith Kumar, Lifeng Nai, Hyesoon Kim
{"title":"Analyzing Consistency Issues in HMC Atomics","authors":"Pranith Kumar, Lifeng Nai, Hyesoon Kim","doi":"10.1145/2989081.2989104","DOIUrl":"https://doi.org/10.1145/2989081.2989104","url":null,"abstract":"As 3D stacked technology gets popular, Processing-in-memory (PIM) is gaining momentum. HMC 2.0 specification offers a fine-grained, instruction granularity offloading capability to the host processor. The current work studies the potential consistency issues which arise from offloading the atomic instructions from CPU to HMC as present in the current specification.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132575542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Memristor Technology for Multi-value Registers in Signed-digit Arithmetic Circuits 用忆阻器技术实现符号数字算术电路中的多值寄存器
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989124
D. Fey, M. Reichenbach, Christopher Söll, Mehrdad Biglari, Jürgen Röber, R. Weigel
{"title":"Using Memristor Technology for Multi-value Registers in Signed-digit Arithmetic Circuits","authors":"D. Fey, M. Reichenbach, Christopher Söll, Mehrdad Biglari, Jürgen Röber, R. Weigel","doi":"10.1145/2989081.2989124","DOIUrl":"https://doi.org/10.1145/2989081.2989124","url":null,"abstract":"Signed-digit (SD) arithmetic exploits positive and negative digits requiring more than two states. It is long known that an addition using trits, i.e. each digit stores not only a 0 or a 1 but also either 2 or -1, requires only a constant number of steps independent of the operands' word length. However, current processors could not profit from that due to the lack of fast, dense and CMOS compatible memory cells that can store reliably multiple states. Memristors offer these features making it necessary to re-evaluate different SD number representations and to evaluate the consequences of an implementation of a multi-value register file with memristors concerning latency, area and energy consumption. Using memristors as multi-value register reduces latency and area on one side compared to flip-flop based memories. On the other side this requires additional sophisticated control circuitry to implement ADCs/DACs, current limiting circuits and to generate control signals to read, write and erase memristors. The paper determines the break-even points at which ternary circuits attached to memristor based registers show better energy-delay products and less area consumption and how much power consumption these improvements cost. By layout synthesis is shown that ternary adders with trit-storing memristors can reduce the latency for a word length of 16 digits about 19% and about 52% for 512 digits compared to a binary carry-look-ahead (CLA) adder with nearly the same power consumption.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125025623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Challenges of Programming a System with Heterogeneous Memories and Heterogeneous Processors: A Programmer's View 用异构存储器和异构处理器编程系统的挑战:一个程序员的观点
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989097
Shuai Che, Arkaprava Basu, J. Gallmeier
{"title":"Challenges of Programming a System with Heterogeneous Memories and Heterogeneous Processors: A Programmer's View","authors":"Shuai Che, Arkaprava Basu, J. Gallmeier","doi":"10.1145/2989081.2989097","DOIUrl":"https://doi.org/10.1145/2989081.2989097","url":null,"abstract":"Recently there has been significant development and innovation in both frontiers of Heterogeneous Memory and Heterogeneous Compute domains. This paper summarizes the challenges, surveys related work, and proposes possible research directions to exploit both heterogeneous memory and compute resources in a computer system. We focus our discussion on the memory system and also touch issues related to heterogeneous compute.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128566990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
HAPPY: Hybrid Address-based Page Policy in DRAMs ram中基于地址的混合页面策略
Proceedings of the Second International Symposium on Memory Systems Pub Date : 2015-09-12 DOI: 10.1145/2989081.2989101
M. Ghasempour, A. Jaleel, J. Garside, M. Luján
{"title":"HAPPY: Hybrid Address-based Page Policy in DRAMs","authors":"M. Ghasempour, A. Jaleel, J. Garside, M. Luján","doi":"10.1145/2989081.2989101","DOIUrl":"https://doi.org/10.1145/2989081.2989101","url":null,"abstract":"Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimum execution time. Hybrid page policies have been investigated as a means of covering these dynamic scenarios and are now implemented in state-of-the-art processors. Hybrid page policies switch between open-page and close-page policies while the application is running, by monitoring the access pattern of row hits/conflicts and predicting future behavior. Unfortunately, as the size of DRAM memory increases, fine-grain tracking and analysis of memory access patterns does not remain practical. We propose a compact memory address-based encoding technique which can improve or maintain the performance of DRAMs page closure predictors while reducing the hardware overhead in comparison with state-of-the-art techniques. As a case study, we integrate our technique, HAPPY, with a state-of-the-art Intel-adaptive monitor (e.g. part of the Intel Xeon X5650) and a traditional Hybrid page policy. We evaluate them across 70 memory intensive workload mixes consisting of single-thread and multi-thread applications. The experimental results show that using the HAPPY encoding applied to the Intel-adaptive page closure policy can reduce the hardware overhead by 5x for the evaluated 64 GB memory (up to 40× for a 512 GB memory) while maintaining the prediction accuracy.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115254007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信