Access order and effective bandwidth for streams on a Direct Rambus memory

Sung I. Hong, S. Mckee, M. H. Salinas, R. Klenke, J. Aylor, W. Wulf
{"title":"Access order and effective bandwidth for streams on a Direct Rambus memory","authors":"Sung I. Hong, S. Mckee, M. H. Salinas, R. Klenke, J. Aylor, W. Wulf","doi":"10.1109/HPCA.1999.744337","DOIUrl":null,"url":null,"abstract":"Processor speeds are increasing rapidly and memory speeds are not keeping up. Streaming computations (such as multimedia or scientific applications) are among those whose performance is most limited by the memory bottleneck. Rambus hopes to bridge the processor/memory performance gap with a recently introduced DRAM that can deliver up to 1.6 Gbytes/sec. We analyze the performance of these interesting new memory devices on the inner loops of streaming computations, both for traditional memory controllers that treat all DRAM transactions as random cacheline accesses, and for controllers augmented with streaming hardware. For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1999.744337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 66

Abstract

Processor speeds are increasing rapidly and memory speeds are not keeping up. Streaming computations (such as multimedia or scientific applications) are among those whose performance is most limited by the memory bottleneck. Rambus hopes to bridge the processor/memory performance gap with a recently introduced DRAM that can deliver up to 1.6 Gbytes/sec. We analyze the performance of these interesting new memory devices on the inner loops of streaming computations, both for traditional memory controllers that treat all DRAM transactions as random cacheline accesses, and for controllers augmented with streaming hardware. For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.
直接Rambus存储器上流的访问顺序和有效带宽
处理器速度正在迅速提高,而内存速度却跟不上。流计算(如多媒体或科学应用程序)是那些性能最受内存瓶颈限制的计算之一。Rambus公司希望通过最近推出的1.6 gb /s的DRAM来弥补处理器/内存性能的差距。我们分析了这些有趣的新存储设备在流计算内环上的性能,既适用于将所有DRAM事务视为随机缓存访问的传统内存控制器,也适用于使用流硬件增强的控制器。对于我们的基准测试,我们发现在cacheline爆发中以计算的自然顺序访问单位步长流利用了由单个Direct RDRAM设备组成的内存系统峰值带宽的44-76%,并且通过具有简单访问排序方案的流机制访问流可以将性能提高1.18到2.25倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信