Proceedings of the 2015 International Symposium on Memory Systems最新文献_第3页

Towards Workload-Aware Page Cache Replacement Policies for Hybrid Memories 面向工作负载感知的混合内存页面缓存替换策略

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818978

Ahsen J. Uppal, Mitesh R. Meswani

{"title":"Towards Workload-Aware Page Cache Replacement Policies for Hybrid Memories","authors":"Ahsen J. Uppal, Mitesh R. Meswani","doi":"10.1145/2818950.2818978","DOIUrl":"https://doi.org/10.1145/2818950.2818978","url":null,"abstract":"Die-stacked DRAM is an emerging technology that is expected to be integrated in future systems with off-package memories resulting in a hybrid memory system. A large body of recent research has investigated the use of die-stacked dynamic random-access memory (DRAM) as a hardware-manged last-level cache. This approach comes at the costs of managing large tag arrays, increased hit latencies, and potentially significant increases in hardware verification costs. An alternative approach is for the operating system (OS) to manage the die-stacked DRAM as a page cache for off-package memories. However, recent work in OS-managed page cache focuses on FIFO replacement and related variants as the baseline management policy. In this paper, we take a step back and investigate classical OS page replacement policies and re-evaluate them for hybrid memories. We find that when we use different die-stacked DRAM sizes, the choice of best management policy depends on cache size and application, and can result in as much as a 13X performance difference. Furthermore, within a single application run, the choice of best policy varies over time. We also evaluate co-scheduled workload pairs and find that the best policy varies by workload pair and cache configuration, and that the best-performing policy is typically the most fair. Our research motivates us to continue our investigation for developing workload-aware and cache configuration-aware page cache management policies.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"35 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131490725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

MEMST: Cloning Memory Behavior using Stochastic Traces MEMST:使用随机轨迹克隆记忆行为

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818971

Ganesh Balakrishnan, Yan Solihin

{"title":"MEMST: Cloning Memory Behavior using Stochastic Traces","authors":"Ganesh Balakrishnan, Yan Solihin","doi":"10.1145/2818950.2818971","DOIUrl":"https://doi.org/10.1145/2818950.2818971","url":null,"abstract":"Memory Controller and DRAM architecture are critical aspects of Chip Multi Processor (CMP) design. A good design needs an in-depth understanding of end-user workloads. However, designers rarely get insights into end-user workloads because of the proprietary nature of source code or data. Workload cloning is an emerging approach that can bridge this gap by creating a proxy for the proprietary workload (clone). Cloning involves profiling workloads to glean key statistics and then generating a clone offline for use in the design environment. However, there are no existing cloning techniques for accurately capturing memory controller and DRAM behavior that can be used by designers for a wide design space exploration. We propose Memory EMulation using Stochastic Traces, MEMST, a highly accurate black box cloning framework for capturing DRAM and MC behavior. We provide a detailed analysis of statistics that are necessary to model a workload accurately. We will also show how a clone can be generated from these statistics using a novel stochastic method. Finally, we will validate our framework across a wide design space by varying DRAM organization, address mapping, DRAM frequency, page policy, scheduling policy, input bus bandwidth, chipset latency, DRAM die revision, DRAM generation and DRAM refresh policy. We evaluated MEMST using CPU2006, BioBench, Stream and PARSEC benchmark suites across the design space for single-core, dual-core, quad-core and octa-core CMPs. We measured both performance and power metrics for the original workload and clones. The clones show a very high degree of correlation with the original workload for over 7900 data points with an average error of 1.8% and 1.6% for transaction latency and DRAM power respectively.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114149962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Rethinking Design Metrics for Datacenter DRAM 重新思考数据中心DRAM的设计指标

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818973

M. Awasthi

引用次数: 6

S-L1: A Software-based GPU L1 Cache that Outperforms the Hardware L1 for Data Processing Applications S-L1:基于软件的GPU L1缓存，在数据处理应用中性能优于硬件L1

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818969

Reza Mokhtari, M. Stumm

引用次数: 1

Implications of Memory Interference for Composed HPC Applications 组合高性能计算应用中内存干扰的含义

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818965

Brian Kocoloski, Yuyu Zhou, B. Childers, J. Lange

引用次数: 6

Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals 指令卸载与HMC 2.0标准:图遍历的案例研究

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818982

Lifeng Nai, Hyesoon Kim

引用次数: 32

Near memory data structure rearrangement 近内存数据结构重排

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818986

M. Gokhale, G. S. Lloyd, C. Hajas

引用次数: 40

Dynamic Memory Pressure Aware Ballooning 动态记忆压力感知气球

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818967

Jinchun Kim, Viacheslav V. Fedorov, Paul V. Gratz, A. Reddy

{"title":"Dynamic Memory Pressure Aware Ballooning","authors":"Jinchun Kim, Viacheslav V. Fedorov, Paul V. Gratz, A. Reddy","doi":"10.1145/2818950.2818967","DOIUrl":"https://doi.org/10.1145/2818950.2818967","url":null,"abstract":"Hardware virtualization is a major component of large scale server and data center deployments due to their facilitation of server consolidation and scalability. Virtualization, however, comes at a high cost in terms of system main memory utilization. Current virtual machine (VM) memory management solutions impose a high performance penalty and are oblivious to the operating regime of the system. Therefore, there is a great need for low-impact VM memory management techniques which are aware of and reactive to current system state, to drive down the overheads of virtualization. We observe that the host machine operates under different memory pressure regimes, as the memory demand from guest VMs changes dynamically at runtime. Adapting to this runtime system state is critical to reduce the performance cost of VM memory management. In this paper, we propose a novel dynamic memory management policy called Memory Pressure Aware (MPA) ballooning. MPA ballooning dynamically allocates memory resources to each VM based on the current memory pressure regime. Moreover, MPA ballooning proactively reacts and adapts to sudden changes in memory demand from guest VMs. MPA ballooning requires neither additional hardware support, nor incurs extra minor page faults in its memory pressure estimation. We show that MPA ballooning provides an 13.2% geomean speed-up versus the current ballooning techniques across a set of application mixes running in guest VMs; often yielding performance nearly identical to that of a non-memory constrained system.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114549057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Software Techniques for Scratchpad Memory Management Scratchpad内存管理软件技术

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818966

Paul Sebexen, Thomas Sohmers

引用次数: 3

Architecture Exploration for Data Intensive Applications 数据密集型应用的架构探索

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818970

Fernando Martin del Campo, P. Chow

引用次数: 1