超标量处理中的存储系统设计

N. Lu, C. Chung
{"title":"超标量处理中的存储系统设计","authors":"N. Lu, C. Chung","doi":"10.1142/S0129053395000233","DOIUrl":null,"url":null,"abstract":"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Memory System Design in Superscalar Processing\",\"authors\":\"N. Lu, C. Chung\",\"doi\":\"10.1142/S0129053395000233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.\",\"PeriodicalId\":270006,\"journal\":{\"name\":\"Int. J. High Speed Comput.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. High Speed Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129053395000233\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053395000233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了超标量处理的存储系统设计。基准测试用于检查加载/存储指令的执行行为,例如加载/存储并行性和内存加载/存储端口利用率。研究发现,仅使用单个装载/存储端口形成了系统瓶颈。一个超标量处理器受益于多个负载/存储端口和系统性能饱和与两个负载/存储端口。如果在一个超标量处理器中支持多个加载/存储端口,则必须仔细设计存储系统。因此,我们考虑了数据缓存子系统的设计。我们研究的数据缓存配置包括多端口缓存、多银行缓存和重复缓存。通过基准测试,我们发现重复缓存在大多数基准测试中都表现良好。然而,重复缓存的成本更高。在超标量多处理环境中,为了适当地维护内存一致性,我们必须考虑处理器的加载/存储顺序。在超标量处理器中,加载/存储排序可以采用以下三种形式之一:完全排序、负载绕过和负载转发。在本研究中,我们得出结论,为了支持顺序一致性模型,加载/存储指令必须完全有序。负载绕过和负载转发足以支持处理器一致性模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Memory System Design in Superscalar Processing
In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信