2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)最新文献

筛选
英文 中文
Assessing the Memory Wall in Complex Codes 评估复杂代码中的内存墙
2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) Pub Date : 2022-11-01 DOI: 10.1109/MCHPC56545.2022.00009
G. Shipman, Jered Dominguez-Trujillo, K. Sheridan, S. Swaminarayan
{"title":"Assessing the Memory Wall in Complex Codes","authors":"G. Shipman, Jered Dominguez-Trujillo, K. Sheridan, S. Swaminarayan","doi":"10.1109/MCHPC56545.2022.00009","DOIUrl":"https://doi.org/10.1109/MCHPC56545.2022.00009","url":null,"abstract":"Many of Los Alamos National Laboratory’s (LANL) High Performance Computing (HPC) codes are heavily memory bandwidth bound. These codes often exhibit high levels of sparse memory access which differ significantly from industry standard benchmarks such as STREAM and GUPS. In this paper we present an analysis of some of our most important code-bases and their memory access patterns. From this analysis we generate representative micro-benchmarks that preserve the memory access characteristics of our codes using two approaches, one based on statistical sampling of relative memory offsets in a sliding time window at the function level and another at the loop level. The function level approach is used to assess the impact of advanced memory technologies such as LPDDR5 and HBM3 using the gem5 [1] simulator. Our simulation results show significant improvements for sparse memory access workloads using HBM3 relative to LPDDR5 and better scaling on a per core basis. Assessment of two different CPU architectures show that significantly higher peak memory bandwidth results in high bandwidth on sparse workloads. These two assessments demonstrate the benefits of this workload characterization technique in memory system design and evaluation.","PeriodicalId":171254,"journal":{"name":"2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132674274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems 为HPC系统评估新兴的支持cxl的内存池
2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) Pub Date : 2022-11-01 DOI: 10.1109/MCHPC56545.2022.00007
Jacob Wahlgren, M. Gokhale, I. Peng
{"title":"Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems","authors":"Jacob Wahlgren, M. Gokhale, I. Peng","doi":"10.1109/MCHPC56545.2022.00007","DOIUrl":"https://doi.org/10.1109/MCHPC56545.2022.00007","url":null,"abstract":"Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to achieve high performance and utilization. In this study, we evaluate a memory subsystem design leveraging CXL-enabled memory pooling. Two promising use cases of composable memory subsystems are studied – fine-grained capacity provisioning and scalable bandwidth provisioning. We developed an emulator to explore the performance impact of various memory compositions. We also provide a profiler to identify the memory usage patterns in applications and their optimization opportunities. Seven scientific and six graph applications are evaluated on various emulated memory configurations. Three out of seven scientific applications had less than 10% performance impact when the pooled memory backed 75% of their memory footprint. The results also show that a dynamically configured high-bandwidth system can effectively support bandwidth-intensive unstructured mesh-based applications like OpenFOAM. Finally, we identify interference through shared memory pools as a practical challenge for adoption on HPC systems.","PeriodicalId":171254,"journal":{"name":"2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132618434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations 通过内存层次结构驱动的数据布局转换最大化性能
2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) Pub Date : 2022-11-01 DOI: 10.1109/MCHPC56545.2022.00006
B. Sepanski, Tuowen Zhao, H. Johansen, Samuel Williams
{"title":"Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations","authors":"B. Sepanski, Tuowen Zhao, H. Johansen, Samuel Williams","doi":"10.1109/MCHPC56545.2022.00006","DOIUrl":"https://doi.org/10.1109/MCHPC56545.2022.00006","url":null,"abstract":"Computations on structured grids using standard multidimensional array layouts can incur substantial data movement costs through the memory hierarchy. This paper explores the benefits of using a framework (Bricks) to separate the complexity of data layout and optimized communication from the functional representation. To that end, we provide three novel contributions and evaluate them on several kernels taken from GENE, a phase-space fusion tokamak simulation code. We extend Bricks to support 6-dimensional arrays and kernels that operate on complex data types, and integrate Bricks with cuFFT. We demonstrate how to optimize Bricks for data reuse, spatial locality, and GPU hardware utilization achieving up to a 2.67 × speedup on a single A100 GPU. We conclude with insights on how to rearchitect memory subsystems.","PeriodicalId":171254,"journal":{"name":"2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121849381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Memory-Bus Energy Consumption of GPUs via Software-Based Bit-Flip Minimization 通过基于软件的位翻转最小化来降低gpu的内存总线能耗
2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) Pub Date : 2022-11-01 DOI: 10.1109/MCHPC56545.2022.00008
Alex Fallin, Martin Burtscher
{"title":"Reducing Memory-Bus Energy Consumption of GPUs via Software-Based Bit-Flip Minimization","authors":"Alex Fallin, Martin Burtscher","doi":"10.1109/MCHPC56545.2022.00008","DOIUrl":"https://doi.org/10.1109/MCHPC56545.2022.00008","url":null,"abstract":"Energy consumption is a major concern in high-performance computing. One important contributing factor is the number of times the wires are charged and discharged, i.e., how often they switch from ‘0’ to ‘1’ and vice versa. We describe a software technique to minimize this switching activity in GPUs, thereby lowering the energy usage. Our technique targets the memory bus, which comprises many high-capacitance wires that are frequently used. Our approach is to strategically change data values in the source code such that loading and storing them yields fewer bit flips. The new values are guaranteed to produce the same control flow and program output. Measurements on GPUs from two generations show that our technique allows programmers to save up to 9.3% of the whole-GPU energy consumption and 1.2% on average across eight graph-analytics CUDA codes without impacting performance.","PeriodicalId":171254,"journal":{"name":"2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131778905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信