{"title":"超标量处理中的存储系统设计","authors":"N. Lu, C. Chung","doi":"10.1142/S0129053395000233","DOIUrl":null,"url":null,"abstract":"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Memory System Design in Superscalar Processing\",\"authors\":\"N. Lu, C. Chung\",\"doi\":\"10.1142/S0129053395000233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.\",\"PeriodicalId\":270006,\"journal\":{\"name\":\"Int. J. High Speed Comput.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. High Speed Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129053395000233\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053395000233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.