H. Irie, N. Hattori, M. Takada, N. Hatta, T. Toyoshima, S. Sakai
{"title":"减少集群微架构上内存通信的转向和转发技术","authors":"H. Irie, N. Hattori, M. Takada, N. Hatta, T. Toyoshima, S. Sakai","doi":"10.1109/IWIA.2005.41","DOIUrl":null,"url":null,"abstract":"In a clustered micro architecture design, the execution core which has large RAMs, large CAMs and all connected result bypass loops is partitioned into smaller execution cores that are called clusters. Clustered microarchitecture can allow a scalable core design because intra-cluster operation remains fast regardless of entire execution width of the core. But localization of critical memory transfers (store-load-consumer) is still a problem. In this work, we propose a technique named \"distributed speculative memory forwarding (DSMF)\" that localizes critical memory transfers into a cluster. DSMF learns memory dependences at retire stage, steers dependent pair of the store and the consumer to the same cluster, transfers data locally in the cluster. We show that the IPC improvement of 15% was obtained by this localization on the baseline clustered microarchitecture.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Steering and forwarding techniques for reducing memory communication on a clustered microarchitecture\",\"authors\":\"H. Irie, N. Hattori, M. Takada, N. Hatta, T. Toyoshima, S. Sakai\",\"doi\":\"10.1109/IWIA.2005.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a clustered micro architecture design, the execution core which has large RAMs, large CAMs and all connected result bypass loops is partitioned into smaller execution cores that are called clusters. Clustered microarchitecture can allow a scalable core design because intra-cluster operation remains fast regardless of entire execution width of the core. But localization of critical memory transfers (store-load-consumer) is still a problem. In this work, we propose a technique named \\\"distributed speculative memory forwarding (DSMF)\\\" that localizes critical memory transfers into a cluster. DSMF learns memory dependences at retire stage, steers dependent pair of the store and the consumer to the same cluster, transfers data locally in the cluster. We show that the IPC improvement of 15% was obtained by this localization on the baseline clustered microarchitecture.\",\"PeriodicalId\":103456,\"journal\":{\"name\":\"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWIA.2005.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWIA.2005.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Steering and forwarding techniques for reducing memory communication on a clustered microarchitecture
In a clustered micro architecture design, the execution core which has large RAMs, large CAMs and all connected result bypass loops is partitioned into smaller execution cores that are called clusters. Clustered microarchitecture can allow a scalable core design because intra-cluster operation remains fast regardless of entire execution width of the core. But localization of critical memory transfers (store-load-consumer) is still a problem. In this work, we propose a technique named "distributed speculative memory forwarding (DSMF)" that localizes critical memory transfers into a cluster. DSMF learns memory dependences at retire stage, steers dependent pair of the store and the consumer to the same cluster, transfers data locally in the cluster. We show that the IPC improvement of 15% was obtained by this localization on the baseline clustered microarchitecture.