DRAM带宽和延迟堆栈:可视化DRAM瓶颈

Stijn Eyerman, W. Heirman, I. Hur
{"title":"DRAM带宽和延迟堆栈:可视化DRAM瓶颈","authors":"Stijn Eyerman, W. Heirman, I. Hur","doi":"10.1109/ispass55109.2022.00045","DOIUrl":null,"url":null,"abstract":"For memory-bound applications, memory bandwidth utilization and memory access latency determine performance. DRAM specifications mention the maximum peak bandwidth and uncontended read latency, but this number is never achieved in practice. Many factors impact the actually achieved bandwidth, and it is often not obvious to hardware architects or software developers how higher bandwidth usage, and thus higher performance, can be achieved. Similarly, latency is impacted by numerous technology constraints and queueing in the memory controller.DRAM bandwidth stacks intuitively visualize the memory bandwidth consumption of an application and indicate where potential bandwidth is lost. The top of the stack is the peak bandwidth, while the bottom component shows the actually achieved bandwidth. The other components show how much bandwidth is wasted on DRAM refresh, precharge and activate commands, or because of (parts of) the DRAM chip being idle when there are no memory operations available. DRAM latency stacks show the average latency of a memory read operation, divided into base read time, row conflict, and multiple queue components. DRAM bandwidth and latency stacks are complementary to CPI stacks and speedup stacks, providing additional insight to optimize the performance of an application or to improve the hardware.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DRAM Bandwidth and Latency Stacks: Visualizing DRAM Bottlenecks\",\"authors\":\"Stijn Eyerman, W. Heirman, I. Hur\",\"doi\":\"10.1109/ispass55109.2022.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For memory-bound applications, memory bandwidth utilization and memory access latency determine performance. DRAM specifications mention the maximum peak bandwidth and uncontended read latency, but this number is never achieved in practice. Many factors impact the actually achieved bandwidth, and it is often not obvious to hardware architects or software developers how higher bandwidth usage, and thus higher performance, can be achieved. Similarly, latency is impacted by numerous technology constraints and queueing in the memory controller.DRAM bandwidth stacks intuitively visualize the memory bandwidth consumption of an application and indicate where potential bandwidth is lost. The top of the stack is the peak bandwidth, while the bottom component shows the actually achieved bandwidth. The other components show how much bandwidth is wasted on DRAM refresh, precharge and activate commands, or because of (parts of) the DRAM chip being idle when there are no memory operations available. DRAM latency stacks show the average latency of a memory read operation, divided into base read time, row conflict, and multiple queue components. DRAM bandwidth and latency stacks are complementary to CPI stacks and speedup stacks, providing additional insight to optimize the performance of an application or to improve the hardware.\",\"PeriodicalId\":115391,\"journal\":{\"name\":\"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ispass55109.2022.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ispass55109.2022.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

对于内存受限的应用程序,内存带宽利用率和内存访问延迟决定了性能。DRAM规范提到了最大峰值带宽和非争用读取延迟,但在实践中从未达到这个数字。许多因素影响实际实现的带宽,对于硬件架构师或软件开发人员来说,如何实现更高的带宽使用和更高的性能通常并不明显。同样,延迟也受到许多技术限制和内存控制器中的队列的影响。DRAM带宽堆栈直观地显示应用程序的内存带宽消耗,并指出潜在带宽丢失的位置。堆栈的顶部是峰值带宽,而底部组件显示实际实现的带宽。其他组件显示了在DRAM刷新、预充电和激活命令上浪费了多少带宽,或者由于(部分)DRAM芯片在没有可用内存操作时处于空闲状态。DRAM延迟堆栈显示内存读取操作的平均延迟,分为基本读取时间、行冲突和多个队列组件。DRAM带宽和延迟堆栈是CPI堆栈和加速堆栈的补充,为优化应用程序的性能或改进硬件提供了额外的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DRAM Bandwidth and Latency Stacks: Visualizing DRAM Bottlenecks
For memory-bound applications, memory bandwidth utilization and memory access latency determine performance. DRAM specifications mention the maximum peak bandwidth and uncontended read latency, but this number is never achieved in practice. Many factors impact the actually achieved bandwidth, and it is often not obvious to hardware architects or software developers how higher bandwidth usage, and thus higher performance, can be achieved. Similarly, latency is impacted by numerous technology constraints and queueing in the memory controller.DRAM bandwidth stacks intuitively visualize the memory bandwidth consumption of an application and indicate where potential bandwidth is lost. The top of the stack is the peak bandwidth, while the bottom component shows the actually achieved bandwidth. The other components show how much bandwidth is wasted on DRAM refresh, precharge and activate commands, or because of (parts of) the DRAM chip being idle when there are no memory operations available. DRAM latency stacks show the average latency of a memory read operation, divided into base read time, row conflict, and multiple queue components. DRAM bandwidth and latency stacks are complementary to CPI stacks and speedup stacks, providing additional insight to optimize the performance of an application or to improve the hardware.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信