Memory-system Design Considerations For Dynamically-scheduled Processors

K. Farkas, P. Chow, N. Jouppi, Z. Vranesic
{"title":"Memory-system Design Considerations For Dynamically-scheduled Processors","authors":"K. Farkas, P. Chow, N. Jouppi, Z. Vranesic","doi":"10.1145/264107.264156","DOIUrl":null,"url":null,"abstract":"In this paper, we identify performance trends and design relationships between the following components of the data memory hierarchy in a dynamically-scheduled processor: the register file, the lockup-free data cache, the stream buffers, and the interface between these components and the lower levels of the memory hierarchy. Similar performance was obtained from all systems having support for fewer than four in-flight misses, irrespective of the register-file size, the issue width of the processor, and the memory bandwidth. While providing support for more than four in-flight misses did increase system performance, the improvement was less than that obtained by increasing the number of registers. The addition of stream buffers to the investigated systems led to a significant performance increase, with the larger increases for systems having less in-flight-miss support, greater memory bandwidth, or more instruction issue capability. The performance of these systems was not significantly affected by the inclusion of traffic filters, dynamic-stride calculators, or the inclusion of the per-load non-unity stride-predictor and the incremental-prefetching techniques, which we introduce. However, the incremental prefetching technique reduces the bandwidth consumed by stream buffers by 50% without a significant impact on performance.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"120","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/264107.264156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 120

Abstract

In this paper, we identify performance trends and design relationships between the following components of the data memory hierarchy in a dynamically-scheduled processor: the register file, the lockup-free data cache, the stream buffers, and the interface between these components and the lower levels of the memory hierarchy. Similar performance was obtained from all systems having support for fewer than four in-flight misses, irrespective of the register-file size, the issue width of the processor, and the memory bandwidth. While providing support for more than four in-flight misses did increase system performance, the improvement was less than that obtained by increasing the number of registers. The addition of stream buffers to the investigated systems led to a significant performance increase, with the larger increases for systems having less in-flight-miss support, greater memory bandwidth, or more instruction issue capability. The performance of these systems was not significantly affected by the inclusion of traffic filters, dynamic-stride calculators, or the inclusion of the per-load non-unity stride-predictor and the incremental-prefetching techniques, which we introduce. However, the incremental prefetching technique reduces the bandwidth consumed by stream buffers by 50% without a significant impact on performance.
动态调度处理器的内存系统设计注意事项
在本文中,我们确定了动态调度处理器中数据内存层次结构的以下组件之间的性能趋势和设计关系:寄存器文件,无锁数据缓存,流缓冲区,以及这些组件与内存层次结构较低级别之间的接口。无论寄存器文件大小、处理器的问题宽度和内存带宽如何,所有支持少于4次飞行中失误的系统都获得了类似的性能。虽然为4次以上的飞行失误提供支持确实提高了系统性能,但这种改进不如增加寄存器数量所获得的改进。将流缓冲区添加到所研究的系统中可以显著提高性能,对于具有更少的飞行失误支持、更大的内存带宽或更多指令问题能力的系统,性能的提高幅度更大。这些系统的性能不会受到流量过滤器、动态步幅计算器、每负载非统一步幅预测器和增量预取技术的显著影响。然而,增量预取技术将流缓冲区消耗的带宽减少了50%,而对性能没有显著影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信