Workshop on Memory System Performance and Correctness最新文献

筛选
英文 中文
A study of data structures with a deep heap shape 具有深堆形状的数据结构研究
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492413
Haggai Eran, E. Petrank
{"title":"A study of data structures with a deep heap shape","authors":"Haggai Eran, E. Petrank","doi":"10.1145/2492408.2492413","DOIUrl":"https://doi.org/10.1145/2492408.2492413","url":null,"abstract":"Computing environments become increasingly parallel, and it seems likely that we will see more cores on tomorrow's desktops and server platforms. In a highly parallel system, tracing garbage collectors may not scale well due to deep heap structures that hinder parallel tracing. Previous work has discovered vulnerabilities within standard Java benchmarks. In this work we examine these standard benchmarks and analyze them to expose the data structures that make current Java benchmarks create deep heap shapes. It turns out that the problem is manifested mostly with benchmarks that employ queues and linked-lists. We then propose a new construction of a lock-free queue data structure with extra references that enables better garbage collector parallelism at a low overhead.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124482694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A new perspective on processing-in-memory architecture design 内存中处理架构设计的新视角
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492418
D. Zhang, N. Jayasena, Alexander Lyashevsky, J. Greathouse, Mitesh R. Meswani, Mark Nutter, Mike Ignatowski
{"title":"A new perspective on processing-in-memory architecture design","authors":"D. Zhang, N. Jayasena, Alexander Lyashevsky, J. Greathouse, Mitesh R. Meswani, Mark Nutter, Mike Ignatowski","doi":"10.1145/2492408.2492418","DOIUrl":"https://doi.org/10.1145/2492408.2492418","url":null,"abstract":"As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical for maintaining the performance scaling that many have come to expect from the computing industry. Moving computation closer to main memory presents an opportunity to reduce the overheads associated with data movement. We explore the potential of using 3D die stacking to move memory-intensive computations closer to memory. This approach to processing-in-memory addresses some drawbacks of prior research on in-memory computing and appears commercially viable in the foreseeable future. We show promising early results from this approach and identify areas that are in need of research to unlock its full potential.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131478267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Software-controlled transparent management of heterogeneous memory resources in virtualized systems 虚拟化系统中异构内存资源的软件控制透明管理
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492416
Min Lee, Vishal Gupta, K. Schwan
{"title":"Software-controlled transparent management of heterogeneous memory resources in virtualized systems","authors":"Min Lee, Vishal Gupta, K. Schwan","doi":"10.1145/2492408.2492416","DOIUrl":"https://doi.org/10.1145/2492408.2492416","url":null,"abstract":"This paper presents a software-controlled technique for managing the heterogeneous memory resources of next generation multicore platforms with fast 3D die-stacked memory and additional slow off-chip memory. Implemented for virtualized server systems, the technique detects the 'hot' pages critical to program performance in order to then maintain them in the scarce fast 3D memory resources. Challenges overcome for the technique's implementation include the need to minimize its runtime overheads, the lack of hypervisor-level direct visibility into the memory access behavior of guest virtual machines, and the need to make page migration transparent to guests. This paper presents hypervisor-level mechanisms that (i) build a page access history of virtual machines, by periodically scanning page-table access bits and (ii) intercept guest page table operations to create mirrored page-tables and enable guest-transparent page migration. The methods are implemented in the Xen hypervisor and evaluated on a larger scale multicore platform. The resulting ability to characterize the memory behavior of representative server workloads demonstrates the feasibility of software-managed heterogeneous memory resources.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117034573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
APE: accelerator processor extensions to optimize data-compute co-location APE:加速器处理器扩展,以优化数据计算协同定位
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492412
Ganesh Venkatesh
{"title":"APE: accelerator processor extensions to optimize data-compute co-location","authors":"Ganesh Venkatesh","doi":"10.1145/2492408.2492412","DOIUrl":"https://doi.org/10.1145/2492408.2492412","url":null,"abstract":"Two technological trends we notice in the current day systems is the march towards many core systems and greater focus on power efficiency. The increase in core counts would result in smaller caches-per-compute node and greater reliance on exposing task-level parallelism in applications. However, this would potentially increase the amount of data that moves within and between the different tasks and hence, the related power costs. This will pose a new burden on the already power-constrained current day systems. The situation would only get worse as we go forward because the power consumed by the wires is not scaling down much with each technology generation, but the amount of data that these wires move is increasing per generation.\u0000 This paper addresses this concern by identifying the memory access patterns that accounts for much of the data movement and designing processor extensions, Apes to support them. These processor extensions are placed closer to the cache structures, rather than the core pipeline, to reduce the data movement and improve compute-data co-location. We show that by doing this we are able to reduce a task's memory accesses by ~2.5×, data movement by 4× and cache miss rate by 40% for a wide range of applications.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130973857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analyzing locality of memory references in GPU architectures 分析GPU架构中内存引用的局部性
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492423
Saurabh Gupta, Ping Xiang, Huiyang Zhou
{"title":"Analyzing locality of memory references in GPU architectures","authors":"Saurabh Gupta, Ping Xiang, Huiyang Zhou","doi":"10.1145/2492408.2492423","DOIUrl":"https://doi.org/10.1145/2492408.2492423","url":null,"abstract":"In this paper we advocate formal locality analysis on memory references of GPGPU kernels. We investigate the locality of reference at different cache levels in the memory hierarchy. At the L1 cache level, we look into the locality behavior at the warp-, the thread block- and the streaming multiprocessor-level. Using matrix multiplication as a case study, we show that our locality analysis accurately captures some interesting and counter-intuitive behavior of the memory accesses. We believe that such analysis will provide very useful insights in understanding the memory accessing behavior and optimizing the memory hierarchy in GPU architectures.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127522438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Introducing kernel-level page reuse for high performance computing 为高性能计算引入内核级页面重用
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492414
S. Valat, Marc Pérache, W. Jalby
{"title":"Introducing kernel-level page reuse for high performance computing","authors":"S. Valat, Marc Pérache, W. Jalby","doi":"10.1145/2492408.2492414","DOIUrl":"https://doi.org/10.1145/2492408.2492414","url":null,"abstract":"Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124978987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Software-level scheduling to exploit non-uniformly shared data cache on GPGPU 利用GPGPU非均匀共享数据缓存的软件级调度
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492421
Bo Wu, Weilin Wang, Xipeng Shen
{"title":"Software-level scheduling to exploit non-uniformly shared data cache on GPGPU","authors":"Bo Wu, Weilin Wang, Xipeng Shen","doi":"10.1145/2492408.2492421","DOIUrl":"https://doi.org/10.1145/2492408.2492421","url":null,"abstract":"Data cache is introduced to GPUs to mitigate the irregular memory access problem. But few studies have investigated how to exploit its full potential. In this work, we consider some important GPU applications that feature data sharing across thread blocks. We show that the sharing is not well exploited because current GPU runtime ignores such a factor when scheduling threads. We then present an application-level transformation to remap thread blocks to data on the fly. With the software-level scheduler, thread blocks with much data sharing are scheduled to share the cache on a streaming multiprocessor (SM). Experiments on four benchmarks show 1.23X speedup on average.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125046294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A coldness metric for cache optimization 缓存优化的冷度度量
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492419
Raj Parihar, C. Ding, Michael C. Huang
{"title":"A coldness metric for cache optimization","authors":"Raj Parihar, C. Ding, Michael C. Huang","doi":"10.1145/2492408.2492419","DOIUrl":"https://doi.org/10.1145/2492408.2492419","url":null,"abstract":"A \"hot\" concept in program optimization is hotness. For example, program optimization targets hot paths, and register allocation targets hot variables. Cache optimization, however, has to target cold data, which are less frequently used and tend to cause cache misses whenever they are accessed. Hot data, in contrast, as they are small and frequently used, tend to stay in cache. In this paper, we define a new metric called \"coldness\" and show how the coldness varies across programs and how much colder the data we have to optimize as the cache size on modern machines increases.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"94 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133036905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cache rationing for multicore 多核缓存配给
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492422
Jacob Brock, C. Ding
{"title":"Cache rationing for multicore","authors":"Jacob Brock, C. Ding","doi":"10.1145/2492408.2492422","DOIUrl":"https://doi.org/10.1145/2492408.2492422","url":null,"abstract":"As the number of transistors on a chip increases, they are used mainly in two ways on multicore processors: first, to increase the number of cores, and second, to increase the size of cache memory. The two approaches intersect at a basic problem, which is how parallel tasks can best share the cache memory. The degree of sharing determines the available cache resource for each core and hence the memory performance and scalability of the system. In this paper, cache rationing is presented as a cache sharing solution for collaborative caching.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116831014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All-window data liveness 全窗口数据活动性
Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI: 10.1145/2492408.2492420
Pengcheng Li, C. Ding
{"title":"All-window data liveness","authors":"Pengcheng Li, C. Ding","doi":"10.1145/2492408.2492420","DOIUrl":"https://doi.org/10.1145/2492408.2492420","url":null,"abstract":"This paper proposes a new metric called all-window liveness, which is the average amount of live data in all time windows of a given length. The paper gives a linear-time algorithm to compute the average liveness for all window lengths and discusses potential uses of the new metric.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114699233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信