G. Shipman, Jered Dominguez-Trujillo, K. Sheridan, S. Swaminarayan
{"title":"Assessing the Memory Wall in Complex Codes","authors":"G. Shipman, Jered Dominguez-Trujillo, K. Sheridan, S. Swaminarayan","doi":"10.1109/MCHPC56545.2022.00009","DOIUrl":null,"url":null,"abstract":"Many of Los Alamos National Laboratory’s (LANL) High Performance Computing (HPC) codes are heavily memory bandwidth bound. These codes often exhibit high levels of sparse memory access which differ significantly from industry standard benchmarks such as STREAM and GUPS. In this paper we present an analysis of some of our most important code-bases and their memory access patterns. From this analysis we generate representative micro-benchmarks that preserve the memory access characteristics of our codes using two approaches, one based on statistical sampling of relative memory offsets in a sliding time window at the function level and another at the loop level. The function level approach is used to assess the impact of advanced memory technologies such as LPDDR5 and HBM3 using the gem5 [1] simulator. Our simulation results show significant improvements for sparse memory access workloads using HBM3 relative to LPDDR5 and better scaling on a per core basis. Assessment of two different CPU architectures show that significantly higher peak memory bandwidth results in high bandwidth on sparse workloads. These two assessments demonstrate the benefits of this workload characterization technique in memory system design and evaluation.","PeriodicalId":171254,"journal":{"name":"2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCHPC56545.2022.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Many of Los Alamos National Laboratory’s (LANL) High Performance Computing (HPC) codes are heavily memory bandwidth bound. These codes often exhibit high levels of sparse memory access which differ significantly from industry standard benchmarks such as STREAM and GUPS. In this paper we present an analysis of some of our most important code-bases and their memory access patterns. From this analysis we generate representative micro-benchmarks that preserve the memory access characteristics of our codes using two approaches, one based on statistical sampling of relative memory offsets in a sliding time window at the function level and another at the loop level. The function level approach is used to assess the impact of advanced memory technologies such as LPDDR5 and HBM3 using the gem5 [1] simulator. Our simulation results show significant improvements for sparse memory access workloads using HBM3 relative to LPDDR5 and better scaling on a per core basis. Assessment of two different CPU architectures show that significantly higher peak memory bandwidth results in high bandwidth on sparse workloads. These two assessments demonstrate the benefits of this workload characterization technique in memory system design and evaluation.