超标量处理中的存储系统设计

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI:10.1142/S0129053395000233

N. Lu, C. Chung

{"title":"超标量处理中的存储系统设计","authors":"N. Lu, C. Chung","doi":"10.1142/S0129053395000233","DOIUrl":null,"url":null,"abstract":"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Memory System Design in Superscalar Processing\",\"authors\":\"N. Lu, C. Chung\",\"doi\":\"10.1142/S0129053395000233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.\",\"PeriodicalId\":270006,\"journal\":{\"name\":\"Int. J. High Speed Comput.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. High Speed Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129053395000233\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053395000233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了超标量处理的存储系统设计。基准测试用于检查加载/存储指令的执行行为，例如加载/存储并行性和内存加载/存储端口利用率。研究发现，仅使用单个装载/存储端口形成了系统瓶颈。一个超标量处理器受益于多个负载/存储端口和系统性能饱和与两个负载/存储端口。如果在一个超标量处理器中支持多个加载/存储端口，则必须仔细设计存储系统。因此，我们考虑了数据缓存子系统的设计。我们研究的数据缓存配置包括多端口缓存、多银行缓存和重复缓存。通过基准测试，我们发现重复缓存在大多数基准测试中都表现良好。然而，重复缓存的成本更高。在超标量多处理环境中，为了适当地维护内存一致性，我们必须考虑处理器的加载/存储顺序。在超标量处理器中，加载/存储排序可以采用以下三种形式之一:完全排序、负载绕过和负载转发。在本研究中，我们得出结论，为了支持顺序一致性模型，加载/存储指令必须完全有序。负载绕过和负载转发足以支持处理器一致性模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Memory System Design in Superscalar Processing

In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. High Speed Comput.

自引率

0.00%

发文量