Proceedings of the 2015 International Symposium on Memory Systems最新文献_第2页

High Performance Computing Co-Design Strategies 高性能计算协同设计策略

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818959

J. Ang

{"title":"High Performance Computing Co-Design Strategies","authors":"J. Ang","doi":"10.1145/2818950.2818959","DOIUrl":"https://doi.org/10.1145/2818950.2818959","url":null,"abstract":"The MEMSYS Call for Papers contains this passage: Many of the problems we see in the memory system are cross-disciplinary in nature -- their solution would likely require work at all levels, from applications to circuits. Thus, while the scope of the problem is memory, the scope of the solutions will be much wider. The Department of Energy's (DOE) high performance computing (HPC) community is thinking about how to define, support and execute work at all levels for the development of future supercomputers to run our portfolio of mission applications. Borrowing a concept from embedded computing, the DOE HPC community is calling our work at all levels co-design [1]. Co-design for embedded computing is focused on hardware/software partitioning of activities to execute a well-defined task within specific constraints. Co-design for general-purpose HPC has many dimensions for both the work to be performed and the constraints, e.g. hardware designs, runtime software, applications and algorithms. The subject of this extended abstract is a description of two alternative DOE HPC co-design strategies. While DOE co-design efforts include more than the memory system, as noted in the MEMSYS call, the memory system impacts applications, circuits and all levels between.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Anatomy of GPU Memory System for Multi-Application Execution 多应用程序执行的GPU内存系统剖析

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818979

Adwait Jog, Onur Kayiran, Tuba Kesten, Ashutosh Pattnaik, Evgeny Bolotin, Niladrish Chatterjee, S. Keckler, M. Kandemir, C. Das

{"title":"Anatomy of GPU Memory System for Multi-Application Execution","authors":"Adwait Jog, Onur Kayiran, Tuba Kesten, Ashutosh Pattnaik, Evgeny Bolotin, Niladrish Chatterjee, S. Keckler, M. Kandemir, C. Das","doi":"10.1145/2818950.2818979","DOIUrl":"https://doi.org/10.1145/2818950.2818979","url":null,"abstract":"As GPUs make headway in the computing landscape spanning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of multiple applications in GPUs becomes essential for unlocking their full potential. However, unlike CPUs, multi-application execution in GPUs is little explored. In this paper, we study the memory system of GPUs in a concurrently executing multi-application environment. We first present an analytical performance model for many-threaded architectures and show that the common use of misses-per-kilo-instruction (MPKI) as a proxy for performance is not accurate without considering the bandwidth usage of applications. We characterize the memory interference of applications and discuss the limitations of existing memory schedulers in mitigating this interference. We extend the analytical model to multiple applications and identify the key metrics to control various performance metrics. We conduct extensive simulations using an enhanced version of GPGPU-Sim targeted for concurrently executing multiple applications, and show that memory scheduling decisions based on MPKI and bandwidth information are more effective in enhancing throughput compared to the traditional FR-FCFS and the recently proposed RR FR-FCFS policies.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"68 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125378806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 82

Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs 省略刷新:商品和宽I/O dram的案例研究

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818964

Matthias Jung, Éder F. Zulian, Deepak M. Mathew, M. Herrmann, Christian Brugger, C. Weis, N. Wehn

{"title":"Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs","authors":"Matthias Jung, Éder F. Zulian, Deepak M. Mathew, M. Herrmann, Christian Brugger, C. Weis, N. Wehn","doi":"10.1145/2818950.2818964","DOIUrl":"https://doi.org/10.1145/2818950.2818964","url":null,"abstract":"Dynamic Random Access Memories (DRAM) have a big impact on performance and contribute significantly to the total power consumption in systems ranging from mobile devices to servers. Up to half of the power consumption of future high density DRAM devices will be caused by refresh commands. Moreover, not only the refresh rate does depend on the device capacity, but it strongly depends on the temperature as well. In case of 3D integration of MPSoCs with Wide I/O DRAMs the power density and thermal dissipation are increased dramatically. Hence, in 3D-DRAM even more DRAM refresh operations are required. To master these challenges, clever DRAM refresh strategies are mandatory either on hardware or on software level using new or already available infrastructures and implementations, such as Partial Array Self Refresh (PASR) or Temperature Compensated Self Refresh (TCSR). In this paper, we show that for dedicated applications refresh can be disabled completely without or with negligible impact on the application performance. This is possible if it is assured that either the lifetime of the data is shorter than the currently required DRAM refresh period or if the application can tolerate bit errors to some degree in a given time window.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129897125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

k-Means Clustering on Two-Level Memory Systems 两级存储系统的k-均值聚类

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818977

M. A. Bender, Jonathan W. Berry, S. Hammond, Branden J. Moore, Benjamin Moseley, C. Phillips

{"title":"k-Means Clustering on Two-Level Memory Systems","authors":"M. A. Bender, Jonathan W. Berry, S. Hammond, Branden J. Moore, Benjamin Moseley, C. Phillips","doi":"10.1145/2818950.2818977","DOIUrl":"https://doi.org/10.1145/2818950.2818977","url":null,"abstract":"In recent work we quantified the anticipated performance boost when a sorting algorithm is modified to leverage user-addressable \"near-memory,\" which we call scratchpad. This architectural feature is expected in the Intel Knight's Landing processors that will be used in DOE's next large-scale supercomputer. This paper expands our analytical study of the scratchpad to consider k-means clustering, a classical data-analysis technique that is ubiquitous in the literature and in practice. We present new theoretical results using the model introduced in [13], which measures memory transfers and assumes that computations are memory-bound. Our theoretical results indicate that scratchpad-aware versions of k-means clustering can expect performance boosts for high-dimensional instances with relatively few cluster centers. These constraints may limit the practical impact of scratch-pad for k-means acceleration, so we discuss their origins and practical implications. We corroborate our theory with experimental runs on a system instrumented to mimic one with scratchpad memory. We also contribute a semi-formalization of the computational properties that are necessary and sufficient to predict a performance boost from scratchpad-aware variants of algorithms. We have observed and studied these properties in the context of sorting, and now clustering. We conclude with some thoughts on the application of these properties to new areas. Specifically, we believe that dense linear algebra has similar properties to k-means, while sparse linear algebra and FFT computations are more similar to sorting. The sparse operations are more common in scientific computing, so we expect scratchpad to have significant impact in that area.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121216289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

MMC: a Many-core Memory Connection Model MMC:多核内存连接模型

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818958

C. Ding, Hao Lu, Chencheng Ye

引用次数: 0

Understanding Energy Aspects of Processing-near-Memory for HPC Workloads 理解HPC工作负载处理-近内存的能量方面

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818985

Hyojong Kim, Hyesoon Kim, S. Yalamanchili, Arun Rodrigues

引用次数: 11

A Data Centric Perspective on Memory Placement 以数据为中心的内存放置视角

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818956

Y. Birk, O. Mencer

引用次数: 2

Modeling Data Movement in the Memory Hierarchy in HPC Systems HPC系统中内存层次结构中的数据移动建模

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818972

Aditya M. Deshpande, J. Draper

引用次数: 3

HpMC: An Energy-aware Management System of Multi-level Memory Architectures HpMC:一种多级存储器架构的能量感知管理系统

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818974

Chun-Yi Su, D. Roberts, E. León, K. Cameron, B. D. Supinski, G. Loh, Dimitrios S. Nikolopoulos

{"title":"HpMC: An Energy-aware Management System of Multi-level Memory Architectures","authors":"Chun-Yi Su, D. Roberts, E. León, K. Cameron, B. D. Supinski, G. Loh, Dimitrios S. Nikolopoulos","doi":"10.1145/2818950.2818974","DOIUrl":"https://doi.org/10.1145/2818950.2818974","url":null,"abstract":"DRAM technology faces density and power challenges to increase capacity because of limitations of physical cell design. To overcome these limitations, system designers are exploring alternative solutions that combine DRAM and emerging NVRAM technologies. Previous work on heterogeneous memories focuses, mainly, on two system designs: PCache, a hierarchical, inclusive memory system, and HRank, a flat, non-inclusive memory system. We demonstrate that neither of these designs can universally achieve high performance and energy efficiency across a suite of HPC workloads. In this work, we investigate the impact of a number of multi-level memory designs on the performance, power, and energy consumption of applications. To achieve this goal and overcome the limited number of available tools to study heterogeneous memories, we created HMsim, an infrastructure that enables n-level, heterogeneous memory studies by leveraging existing memory simulators. We, then, propose HpMC, a new memory controller design that combines the best aspects of existing management policies to improve performance and energy. Our energy-aware memory management system dynamically switches between PCache and HRank based on the temporal locality of applications. Our results show that HpMC reduces energy consumption from 13% to 45% compared to PCache and HRank, while providing the same bandwidth and higher capacity than a conventional DRAM system.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127213158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Shared Last-Level Caches and The Case for Longer Timeslices 共享最后一级缓存和更长的时间片的情况

Proceedings of the 2015 International Symposium on Memory Systems Pub Date : 2015-10-05 DOI: 10.1145/2818950.2818968

Viacheslav V. Fedorov, A. Reddy, Paul V. Gratz

引用次数: 2