H. Irie, N. Hattori, M. Takada, N. Hatta, T. Toyoshima, S. Sakai
{"title":"Steering and forwarding techniques for reducing memory communication on a clustered microarchitecture","authors":"H. Irie, N. Hattori, M. Takada, N. Hatta, T. Toyoshima, S. Sakai","doi":"10.1109/IWIA.2005.41","DOIUrl":null,"url":null,"abstract":"In a clustered micro architecture design, the execution core which has large RAMs, large CAMs and all connected result bypass loops is partitioned into smaller execution cores that are called clusters. Clustered microarchitecture can allow a scalable core design because intra-cluster operation remains fast regardless of entire execution width of the core. But localization of critical memory transfers (store-load-consumer) is still a problem. In this work, we propose a technique named \"distributed speculative memory forwarding (DSMF)\" that localizes critical memory transfers into a cluster. DSMF learns memory dependences at retire stage, steers dependent pair of the store and the consumer to the same cluster, transfers data locally in the cluster. We show that the IPC improvement of 15% was obtained by this localization on the baseline clustered microarchitecture.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWIA.2005.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In a clustered micro architecture design, the execution core which has large RAMs, large CAMs and all connected result bypass loops is partitioned into smaller execution cores that are called clusters. Clustered microarchitecture can allow a scalable core design because intra-cluster operation remains fast regardless of entire execution width of the core. But localization of critical memory transfers (store-load-consumer) is still a problem. In this work, we propose a technique named "distributed speculative memory forwarding (DSMF)" that localizes critical memory transfers into a cluster. DSMF learns memory dependences at retire stage, steers dependent pair of the store and the consumer to the same cluster, transfers data locally in the cluster. We show that the IPC improvement of 15% was obtained by this localization on the baseline clustered microarchitecture.