Proceedings of the 2015 International Symposium on Memory Systems最新文献

筛选
英文 中文
High Performance Computing Co-Design Strategies 高性能计算协同设计策略
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818959
J. Ang
{"title":"High Performance Computing Co-Design Strategies","authors":"J. Ang","doi":"10.1145/2818950.2818959","DOIUrl":"https://doi.org/10.1145/2818950.2818959","url":null,"abstract":"The MEMSYS Call for Papers contains this passage: Many of the problems we see in the memory system are cross-disciplinary in nature -- their solution would likely require work at all levels, from applications to circuits. Thus, while the scope of the problem is memory, the scope of the solutions will be much wider. The Department of Energy's (DOE) high performance computing (HPC) community is thinking about how to define, support and execute work at all levels for the development of future supercomputers to run our portfolio of mission applications. Borrowing a concept from embedded computing, the DOE HPC community is calling our work at all levels co-design [1]. Co-design for embedded computing is focused on hardware/software partitioning of activities to execute a well-defined task within specific constraints. Co-design for general-purpose HPC has many dimensions for both the work to be performed and the constraints, e.g. hardware designs, runtime software, applications and algorithms. The subject of this extended abstract is a description of two alternative DOE HPC co-design strategies. While DOE co-design efforts include more than the memory system, as noted in the MEMSYS call, the memory system impacts applications, circuits and all levels between.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Anatomy of GPU Memory System for Multi-Application Execution 多应用程序执行的GPU内存系统剖析
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818979
Adwait Jog, Onur Kayiran, Tuba Kesten, Ashutosh Pattnaik, Evgeny Bolotin, Niladrish Chatterjee, S. Keckler, M. Kandemir, C. Das
{"title":"Anatomy of GPU Memory System for Multi-Application Execution","authors":"Adwait Jog, Onur Kayiran, Tuba Kesten, Ashutosh Pattnaik, Evgeny Bolotin, Niladrish Chatterjee, S. Keckler, M. Kandemir, C. Das","doi":"10.1145/2818950.2818979","DOIUrl":"https://doi.org/10.1145/2818950.2818979","url":null,"abstract":"As GPUs make headway in the computing landscape spanning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of multiple applications in GPUs becomes essential for unlocking their full potential. However, unlike CPUs, multi-application execution in GPUs is little explored. In this paper, we study the memory system of GPUs in a concurrently executing multi-application environment. We first present an analytical performance model for many-threaded architectures and show that the common use of misses-per-kilo-instruction (MPKI) as a proxy for performance is not accurate without considering the bandwidth usage of applications. We characterize the memory interference of applications and discuss the limitations of existing memory schedulers in mitigating this interference. We extend the analytical model to multiple applications and identify the key metrics to control various performance metrics. We conduct extensive simulations using an enhanced version of GPGPU-Sim targeted for concurrently executing multiple applications, and show that memory scheduling decisions based on MPKI and bandwidth information are more effective in enhancing throughput compared to the traditional FR-FCFS and the recently proposed RR FR-FCFS policies.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"68 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125378806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs 省略刷新:商品和宽I/O dram的案例研究
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818964
Matthias Jung, Éder F. Zulian, Deepak M. Mathew, M. Herrmann, Christian Brugger, C. Weis, N. Wehn
{"title":"Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs","authors":"Matthias Jung, Éder F. Zulian, Deepak M. Mathew, M. Herrmann, Christian Brugger, C. Weis, N. Wehn","doi":"10.1145/2818950.2818964","DOIUrl":"https://doi.org/10.1145/2818950.2818964","url":null,"abstract":"Dynamic Random Access Memories (DRAM) have a big impact on performance and contribute significantly to the total power consumption in systems ranging from mobile devices to servers. Up to half of the power consumption of future high density DRAM devices will be caused by refresh commands. Moreover, not only the refresh rate does depend on the device capacity, but it strongly depends on the temperature as well. In case of 3D integration of MPSoCs with Wide I/O DRAMs the power density and thermal dissipation are increased dramatically. Hence, in 3D-DRAM even more DRAM refresh operations are required. To master these challenges, clever DRAM refresh strategies are mandatory either on hardware or on software level using new or already available infrastructures and implementations, such as Partial Array Self Refresh (PASR) or Temperature Compensated Self Refresh (TCSR). In this paper, we show that for dedicated applications refresh can be disabled completely without or with negligible impact on the application performance. This is possible if it is assured that either the lifetime of the data is shorter than the currently required DRAM refresh period or if the application can tolerate bit errors to some degree in a given time window.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129897125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
k-Means Clustering on Two-Level Memory Systems 两级存储系统的k-均值聚类
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818977
M. A. Bender, Jonathan W. Berry, S. Hammond, Branden J. Moore, Benjamin Moseley, C. Phillips
{"title":"k-Means Clustering on Two-Level Memory Systems","authors":"M. A. Bender, Jonathan W. Berry, S. Hammond, Branden J. Moore, Benjamin Moseley, C. Phillips","doi":"10.1145/2818950.2818977","DOIUrl":"https://doi.org/10.1145/2818950.2818977","url":null,"abstract":"In recent work we quantified the anticipated performance boost when a sorting algorithm is modified to leverage user-addressable \"near-memory,\" which we call scratchpad. This architectural feature is expected in the Intel Knight's Landing processors that will be used in DOE's next large-scale supercomputer. This paper expands our analytical study of the scratchpad to consider k-means clustering, a classical data-analysis technique that is ubiquitous in the literature and in practice. We present new theoretical results using the model introduced in [13], which measures memory transfers and assumes that computations are memory-bound. Our theoretical results indicate that scratchpad-aware versions of k-means clustering can expect performance boosts for high-dimensional instances with relatively few cluster centers. These constraints may limit the practical impact of scratch-pad for k-means acceleration, so we discuss their origins and practical implications. We corroborate our theory with experimental runs on a system instrumented to mimic one with scratchpad memory. We also contribute a semi-formalization of the computational properties that are necessary and sufficient to predict a performance boost from scratchpad-aware variants of algorithms. We have observed and studied these properties in the context of sorting, and now clustering. We conclude with some thoughts on the application of these properties to new areas. Specifically, we believe that dense linear algebra has similar properties to k-means, while sparse linear algebra and FFT computations are more similar to sorting. The sparse operations are more common in scientific computing, so we expect scratchpad to have significant impact in that area.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121216289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
MMC: a Many-core Memory Connection Model MMC:多核内存连接模型
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818958
C. Ding, Hao Lu, Chencheng Ye
{"title":"MMC: a Many-core Memory Connection Model","authors":"C. Ding, Hao Lu, Chencheng Ye","doi":"10.1145/2818950.2818958","DOIUrl":"https://doi.org/10.1145/2818950.2818958","url":null,"abstract":"This extended abstract formulates a model of parallel performance called MMC. It gives the theoretical upper bound of parallel performance based on three factors: the processing capacity, the network capacity, and the memory capacity.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Energy Aspects of Processing-near-Memory for HPC Workloads 理解HPC工作负载处理-近内存的能量方面
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818985
Hyojong Kim, Hyesoon Kim, S. Yalamanchili, Arun Rodrigues
{"title":"Understanding Energy Aspects of Processing-near-Memory for HPC Workloads","authors":"Hyojong Kim, Hyesoon Kim, S. Yalamanchili, Arun Rodrigues","doi":"10.1145/2818950.2818985","DOIUrl":"https://doi.org/10.1145/2818950.2818985","url":null,"abstract":"Interests in the concept of processing-near-memory (PNM) have been reignited with recent improvements of the 3D integration technology. In this work, we analyze the energy consumption characteristics of a system which comprises a conventional processor and a 3D memory stack with fully-programmable cores. We construct a high-level analytical energy model based on the underlying architecture and the technology with which each component is built. From the preliminary experiments with 11 HPC benchmarks from Mantevo benchmark suite, we observed that misses per kilo instructions (MPKI) of last-level cache (LLC) is one of the most important characteristics in determining the friendliness of the application to the PNM execution.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Data Centric Perspective on Memory Placement 以数据为中心的内存放置视角
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818956
Y. Birk, O. Mencer
{"title":"A Data Centric Perspective on Memory Placement","authors":"Y. Birk, O. Mencer","doi":"10.1145/2818950.2818956","DOIUrl":"https://doi.org/10.1145/2818950.2818956","url":null,"abstract":"In this paper, we focus on memory in its role as a channel for passing information from one instruction to another; in particular, in conjunction with spatial or dataflow computing architectures, wherein the computing elements are laid out like an assembly plant. We point out the opportunity to dramatically increase effective data access bandwidth by going from a centralized memory array model with a few ports to numerous tiny buffers that can be accessed concurrently. The penalty is loss in access flexibility, but this flexibility is often a by-product of the memory organization rather than a true need. The improvements in hardware reconfiguration speed and resolution, combined with definition of standard buffer queuing and routing capabilities and efforts by tool designers and application developers are likely to extend the applicability of those architectures, offering dramatic power-cost-performance advantages.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128685486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling Data Movement in the Memory Hierarchy in HPC Systems HPC系统中内存层次结构中的数据移动建模
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818972
Aditya M. Deshpande, J. Draper
{"title":"Modeling Data Movement in the Memory Hierarchy in HPC Systems","authors":"Aditya M. Deshpande, J. Draper","doi":"10.1145/2818950.2818972","DOIUrl":"https://doi.org/10.1145/2818950.2818972","url":null,"abstract":"Increasing core counts and cache sizes in modern processors are causing data movement across the memory hierarchy to increase. With High Performance Computing (HPC) systems becoming more and more energy constrained, improving energy efficiency is becoming a necessity. Given its significant impact on system energy efficiency, the data movement costs in terms of energy and performance cannot be neglected. Conventional techniques for modeling and analyzing data movement across the memory hierarchy have proven to be inadequate in helping computer architects and system designers to optimize data movement. Our work is a position statement emphasizing the need for more detailed data movement modeling tools that better quantify how data movement across the memory hierarchy during application execution affects energy and performance. The hope is that exposing more detailed characteristics about the data movement would enable designers to optimize applications and architectures for minimizing data movement and in turn reduce energy and perhaps even increase performance.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133540195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HpMC: An Energy-aware Management System of Multi-level Memory Architectures HpMC:一种多级存储器架构的能量感知管理系统
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818974
Chun-Yi Su, D. Roberts, E. León, K. Cameron, B. D. Supinski, G. Loh, Dimitrios S. Nikolopoulos
{"title":"HpMC: An Energy-aware Management System of Multi-level Memory Architectures","authors":"Chun-Yi Su, D. Roberts, E. León, K. Cameron, B. D. Supinski, G. Loh, Dimitrios S. Nikolopoulos","doi":"10.1145/2818950.2818974","DOIUrl":"https://doi.org/10.1145/2818950.2818974","url":null,"abstract":"DRAM technology faces density and power challenges to increase capacity because of limitations of physical cell design. To overcome these limitations, system designers are exploring alternative solutions that combine DRAM and emerging NVRAM technologies. Previous work on heterogeneous memories focuses, mainly, on two system designs: PCache, a hierarchical, inclusive memory system, and HRank, a flat, non-inclusive memory system. We demonstrate that neither of these designs can universally achieve high performance and energy efficiency across a suite of HPC workloads. In this work, we investigate the impact of a number of multi-level memory designs on the performance, power, and energy consumption of applications. To achieve this goal and overcome the limited number of available tools to study heterogeneous memories, we created HMsim, an infrastructure that enables n-level, heterogeneous memory studies by leveraging existing memory simulators. We, then, propose HpMC, a new memory controller design that combines the best aspects of existing management policies to improve performance and energy. Our energy-aware memory management system dynamically switches between PCache and HRank based on the temporal locality of applications. Our results show that HpMC reduces energy consumption from 13% to 45% compared to PCache and HRank, while providing the same bandwidth and higher capacity than a conventional DRAM system.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127213158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Shared Last-Level Caches and The Case for Longer Timeslices 共享最后一级缓存和更长的时间片的情况
Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818968
Viacheslav V. Fedorov, A. Reddy, Paul V. Gratz
{"title":"Shared Last-Level Caches and The Case for Longer Timeslices","authors":"Viacheslav V. Fedorov, A. Reddy, Paul V. Gratz","doi":"10.1145/2818950.2818968","DOIUrl":"https://doi.org/10.1145/2818950.2818968","url":null,"abstract":"Memory performance is important in modern systems. Contention at various levels in memory hierarchy can lead to significant application performance degradation due to interference. Further, modern, large, last-level caches (LLC) have fill times greater than the OS scheduling window. When several threads are running concurrently and timesharing the CPU cores, they may never be able to load their working sets into the cache before being rescheduled, thus permanently stuck in the \"cold-start\" regime. We show that by increasing the system scheduling timeslice length it is possible to amortize the cache cold-start penalty due to the multitasking and improve application performance by 10--15%.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126580304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信