直接Rambus存储器上流的访问顺序和有效带宽

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI:10.1109/HPCA.1999.744337

Sung I. Hong, S. Mckee, M. H. Salinas, R. Klenke, J. Aylor, W. Wulf

{"title":"直接Rambus存储器上流的访问顺序和有效带宽","authors":"Sung I. Hong, S. Mckee, M. H. Salinas, R. Klenke, J. Aylor, W. Wulf","doi":"10.1109/HPCA.1999.744337","DOIUrl":null,"url":null,"abstract":"Processor speeds are increasing rapidly and memory speeds are not keeping up. Streaming computations (such as multimedia or scientific applications) are among those whose performance is most limited by the memory bottleneck. Rambus hopes to bridge the processor/memory performance gap with a recently introduced DRAM that can deliver up to 1.6 Gbytes/sec. We analyze the performance of these interesting new memory devices on the inner loops of streaming computations, both for traditional memory controllers that treat all DRAM transactions as random cacheline accesses, and for controllers augmented with streaming hardware. For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":"{\"title\":\"Access order and effective bandwidth for streams on a Direct Rambus memory\",\"authors\":\"Sung I. Hong, S. Mckee, M. H. Salinas, R. Klenke, J. Aylor, W. Wulf\",\"doi\":\"10.1109/HPCA.1999.744337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processor speeds are increasing rapidly and memory speeds are not keeping up. Streaming computations (such as multimedia or scientific applications) are among those whose performance is most limited by the memory bottleneck. Rambus hopes to bridge the processor/memory performance gap with a recently introduced DRAM that can deliver up to 1.6 Gbytes/sec. We analyze the performance of these interesting new memory devices on the inner loops of streaming computations, both for traditional memory controllers that treat all DRAM transactions as random cacheline accesses, and for controllers augmented with streaming hardware. For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.\",\"PeriodicalId\":287867,\"journal\":{\"name\":\"Proceedings Fifth International Symposium on High-Performance Computer Architecture\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"66\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.1999.744337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1999.744337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

摘要

处理器速度正在迅速提高，而内存速度却跟不上。流计算(如多媒体或科学应用程序)是那些性能最受内存瓶颈限制的计算之一。Rambus公司希望通过最近推出的1.6 gb /s的DRAM来弥补处理器/内存性能的差距。我们分析了这些有趣的新存储设备在流计算内环上的性能，既适用于将所有DRAM事务视为随机缓存访问的传统内存控制器，也适用于使用流硬件增强的控制器。对于我们的基准测试，我们发现在cacheline爆发中以计算的自然顺序访问单位步长流利用了由单个Direct RDRAM设备组成的内存系统峰值带宽的44-76%，并且通过具有简单访问排序方案的流机制访问流可以将性能提高1.18到2.25倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Access order and effective bandwidth for streams on a Direct Rambus memory

Processor speeds are increasing rapidly and memory speeds are not keeping up. Streaming computations (such as multimedia or scientific applications) are among those whose performance is most limited by the memory bottleneck. Rambus hopes to bridge the processor/memory performance gap with a recently introduced DRAM that can deliver up to 1.6 Gbytes/sec. We analyze the performance of these interesting new memory devices on the inner loops of streaming computations, both for traditional memory controllers that treat all DRAM transactions as random cacheline accesses, and for controllers augmented with streaming hardware. For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Fifth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量