Proceedings of the Second International Symposium on Memory Systems最新文献_第3页

Understanding the Impact of Air and Microfluidics Cooling on Performance of 3D Stacked Memory Systems 了解空气和微流体冷却对3D堆叠存储系统性能的影响

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989098

S. M. Hassan, S. Yalamanchili

{"title":"Understanding the Impact of Air and Microfluidics Cooling on Performance of 3D Stacked Memory Systems","authors":"S. M. Hassan, S. Yalamanchili","doi":"10.1145/2989081.2989098","DOIUrl":"https://doi.org/10.1145/2989081.2989098","url":null,"abstract":"Three-dimensional stacking has increased the memory bandwidth available to cores allowing sustainable performance improvement through technology generations. However, lower heat removal capability and higher DRAM density in such systems increases their temperature and requires larger number of rows to be refreshed at significantly higher rates. Higher operating temperature prohibits performance scaling by not only decreasing memory bandwidth availability but also reducing core frequency specially in the case where memory is stacked directly on top of the processor die (3D). Liquid cooling using microfluidics technology is a promising solution that keeps the temperature low increasing the operating range of 3D systems, thus allowing sustained performance improvement. This work attempts to understand the impact of temperature on performance and the advantages of using microfluidics technology for continued performance scaling. We show that conventional air cooling solutions limit 3D stacks to work only for memory-intensive applications running at low frequency, whereas microfluidics cooling technology allow them to push their envelope to not only compute intensive domains but also memory-intensive scenarios that can run at significantly higher operating frequencies.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes 使用ApexMAP应用探针表征混合内存立方体的性能

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989090

K. Ibrahim, Farzad Fatollahi-Fard, D. Donofrio, J. Shalf

{"title":"Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes","authors":"K. Ibrahim, Farzad Fatollahi-Fard, D. Donofrio, J. Shalf","doi":"10.1145/2989081.2989090","DOIUrl":"https://doi.org/10.1145/2989081.2989090","url":null,"abstract":"Full characterization of the performance of a new memory technology is typically a subtle process because of the difficulty in subjecting the memory to different access patterns before creating a full system. Simple performance characterization, such as raw bandwidth, does not give enough information about the suitability of the memory for different architectural design choices, such as suitability for processing in memory, performance reliance on relaxed ordering semantic, or how to implement atomics, etc. This paper discusses the use of the ApexMAP synthetic benchmarks to assess the Hybrid Memory Cube (HMC) technology. ApexMAP, through a simple model for spatial and temporal locality, allows creating many application probes that could be used to subject the memory to different access patterns. We use a Verilog implementation of ApexMAP to show the impact of contending requests, flow control, and access granularity on the HMC performance. We show a wide variation (up to 20×) in the observed performance based on the application locality parameters and the HMC architectural configurations.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127364685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Languages Must Expose Memory Heterogeneity 语言必须暴露内存异构性

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989122

Xiaochen Guo, Aviral Shrivastava, Michael F. Spear, Gang Tan

引用次数: 2

How Many MLCs Should Impersonate SLCs to Optimize SSD Performance? 有多少mlc应该模拟slc来优化SSD性能?

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989095

Wei Wang, Wen Pan, T. Xie, Deng Zhou

{"title":"How Many MLCs Should Impersonate SLCs to Optimize SSD Performance?","authors":"Wei Wang, Wen Pan, T. Xie, Deng Zhou","doi":"10.1145/2989081.2989095","DOIUrl":"https://doi.org/10.1145/2989081.2989095","url":null,"abstract":"Since an MLC (multi-level cell) can be used in an SLC (single-level cell) mode, an MLC-based flash SSD typically uses a fixed small portion (called log partition) in the SLC mode to accommodate hot data so that its overall performance can be improved. In this paper, we show that a fixed capacity of a log partition without considering workload characteristics can lead to an unexpected overall performance degradation. Contrary to intuition, we notice that blindly enlarging the capacity of a log partition would also result in worse performance due to the increased garbage collection cost in a data partition, which serves cold data. How many MLCs should impersonate SLCs under a particular workload to achieve an optimized performance is still an open question. To answer this question, we first measure write costs on each partition and their impact on the overall performance of an SSD. Next, a hardware-validated write cost model is built. Based on the model, we demonstrate that for each workload there always exists an optimal partitioning scheme. Further, to verify the effectiveness of our workload-aware dynamic partitioning strategy, we implement an FTL (flash translation layer) called BROMS (Best Ratio Of MLC to SLC), which adaptively adjusts the capacities of two partitions according to the workload characteristics. Experimental results from a hardware platform show that BROMS outperforms a fixed partitioning scheme by up to 86%.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115402096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Reliability and Performance Trade-off Study of Heterogeneous Memories 异构存储器的可靠性和性能权衡研究

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989113

Manish Gupta, D. Roberts, Mitesh R. Meswani, Vilas Sridharan, D. Tullsen, Rajesh K. Gupta

{"title":"Reliability and Performance Trade-off Study of Heterogeneous Memories","authors":"Manish Gupta, D. Roberts, Mitesh R. Meswani, Vilas Sridharan, D. Tullsen, Rajesh K. Gupta","doi":"10.1145/2989081.2989113","DOIUrl":"https://doi.org/10.1145/2989081.2989113","url":null,"abstract":"Heterogeneous memories, organized as die-stacked in-package and off-package memory, have been a focus of attention by the computer architects to improve memory bandwidth and capacity. Researchers have explored methods and organizations to optimize performance by increasing the access rate to faster die-stacked memory. Unfortunately, reliability of such arrangements has not been studied carefully thus making them less attractive for data centers and mission-critical systems. Field studies show memory reliability depends on device physics as well as on error correction codes (ECC). Due to the capacity, latency, and energy costs of ECC, the performance-critical in-package memories may favor weaker ECC solutions than off-chip. Moreover, these systems are optimized to run at peak performance by increasing access rate to high-performance in-package memory. In this paper, authors use the real-world DRAM failure data to conduct a trade-off study on reliability and performance of Heterogeneous Memory Architectures (HMA). This paper illustrates the problem that an HMA system which only optimizes for performance may suffer from impaired reliability over time. This work also proposes an age-aware access rate control algorithm to ensure reliable operation of long-running systems.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115685927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Reverse Engineering of DRAMs: Row Hammer with Crosshair 逆向工程的dram:排锤与十字准星

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989114

Matthias Jung, C. Rheinländer, C. Weis, N. Wehn

引用次数: 35

Fast full system memory checkpointing with SSD-aware memory controller 快速全系统内存检查点与ssd感知内存控制器

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989126

Jim Stevens, Paul Tschirhart, B. Jacob

引用次数: 2

Nswap2L: Transparently Managing Heterogeneous Cluster Storage Resources for Fast Swapping Nswap2L:透明地管理异构集群存储资源，实现快速交换

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989107

T. Newhall, E. R. Lehman-Borer, Benjamin Marks

{"title":"Nswap2L: Transparently Managing Heterogeneous Cluster Storage Resources for Fast Swapping","authors":"T. Newhall, E. R. Lehman-Borer, Benjamin Marks","doi":"10.1145/2989081.2989107","DOIUrl":"https://doi.org/10.1145/2989081.2989107","url":null,"abstract":"To support data intensive cluster computing, it is increasingly important that node virtual memory (VM) systems make effective use of available fast storage devices for swap or temporary file space. Nswap2L is a novel system that transparently manages a heterogeneous set of storage options commonly found in clusters, including node RAM, disk, flash SSD, PCM, or network storage devices. Nswap2L implements a two-level device driver interface. At the top level, it appears to node operating systems (OSs) as a single, fast, random access device that can be added as a swap partition on cluster nodes. It transparently manages the underlying heterogeneous storage devices, including its own implementation of Network RAM, to which swapped out data are stored. It implements data placement, migration, and prefetching policies that choose which underlying physical devices store swapped-out page data. Its policies incorporate information about device capacity, system load, and the strengths of different physical storage media. By moving device-specific knowledge into Nswap2L, VM policies in the OS can be based solely on typical application access patterns and not on characteristics of underlying physical storage media. Nswap2L's policy decisions are abstracted from the OS, freeing the OS from having to implement specialized policies for different combinations of cluster storage---Nswap2L requires no changes to the OS's VM system. Results of our benchmark tests show that data-intensive applications perform up to 6 times faster on Nswap2L-enabled clusters, and show that our two-level device driver design adds minimal I/O latency to the underlying devices that Nswap2L manages. In addition, we found that even though Nswap2L's Network RAM is faster than any other backing store, its prefetching policy that distributes data over multiple devices results in increased I/O parallelism and can lead to better performance than swapping only to a single underlying device.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130181698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

On the Use of DRAM with Unrepaired Weak Cells in Computing Systems 弱单元未修复的DRAM在计算系统中的应用

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989108

Hao Wang, Yin Li, Xuebin Zhang, Xiaoqing Zhao, Hongbin Sun, Tong Zhang

{"title":"On the Use of DRAM with Unrepaired Weak Cells in Computing Systems","authors":"Hao Wang, Yin Li, Xuebin Zhang, Xiaoqing Zhao, Hongbin Sun, Tong Zhang","doi":"10.1145/2989081.2989108","DOIUrl":"https://doi.org/10.1145/2989081.2989108","url":null,"abstract":"In current practice, DRAM manufacturers apply redundancy-repair to decommission all the weak cells that cannot satisfy the target data retention time under the worse-case operational conditions (e.g., the highest operating temperature). However, as the DRAM scaling enters sub-20nm regime, it becomes increasingly challenging to repair all the weak cells at reasonable cost. This work studies how one could use DRAM chips with unrepaired weak cells in computing systems. In particular, this work is based upon the simple idea that OS reserves all the error-prone pages, which contain at least one unrepaired weak cell, from being used. Under a relatively high error-prone page rate (e.g., 8%), this basic idea is subject to two issues: (1) Simply reserving all the error-prone pages could make it almost impossible for OS to allocate a continuous fragmentation-free physical memory space for some critical operations such as OS booting and DMA buffering. (2) Since most error-prone pages may only contain few unrepaired weak cells, reserving all the error-prone pages from practical usage could cause noticeable memory resource waste. Aiming to address these issues, this paper presents a controller-based selective page re-mapping strategy to ensure a continuous critical memory region for OS, and develops a software-based memory error tolerance scheme to re-cycle all the error-prone pages for the zRAM function in Linux. Since the first scheme only eliminates the fragmentation in the critical memory region (e.g., 128MB in Linux), the remaining non-critical memory region is still subject to severe fragmentation. Hence, we carried out experiments using SPEC CPU2006 to quantitatively demonstrate that highly fragmented non-critical memory region may not cause significant computing system performance degradation. We further study the latency and hardware cost of implementing the controller-based page re-mapping, and the effectiveness of re-cycling error-prone pages for zRAM in Linux. The experimental results show that our proposed software-based error tolerance scheme degrades the speed performance of zRAM by only up to 7%.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129516712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications 向高性能计算应用程序展示异构内存架构的局部性

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI: 10.1145/2989081.2989115

Brice Goglin

{"title":"Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications","authors":"Brice Goglin","doi":"10.1145/2989081.2989115","DOIUrl":"https://doi.org/10.1145/2989081.2989115","url":null,"abstract":"High-performance computing requires a deep knowledge of the hardware platform to fully exploit its computing power. The performance of data transfer between cores and memory is becoming critical. Therefore locality is a major area of optimization on the road to exascale. Indeed, tasks and data have to be carefully distributed on the computing and memory resources. We discuss the current way to expose processor and memory locality information in the Linux kernel and in user-space libraries such as the hwloc software project. The current de facto standard structural modeling of the platform as the tree is not perfect, but it offers a good compromise between precision and convenience for HPC runtimes. We present an in-depth study of the software view of the upcoming Intel Knights Landing processor. Its memory locality cannot be properly exposed to user-space applications without a significant rework of the current software stack. We propose an extension of the current hierarchical platform model in hwloc. It correctly exposes new heterogeneous architectures with high-bandwidth or non-volatile memories to applications, while still being convenient for affinity-aware HPC runtimes.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130625915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19