纳米级处理器的可变延迟缓存

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI:10.1145/1362622.1362650

S. Ozdemir, A. Mallik, J. Ku, G. Memik, Y. Ismail

{"title":"纳米级处理器的可变延迟缓存","authors":"S. Ozdemir, A. Mallik, J. Ku, G. Memik, Y. Ismail","doi":"10.1145/1362622.1362650","DOIUrl":null,"url":null,"abstract":"Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache architecture with variable access latency. Particularly, we a) develop a non-uniform access level 1 data-cache, b) study the impact of coupling and physical location on level 1 data cache access latencies, and c) develop and study an architecture where the variable latency cache can be accessed while the rest of the pipeline remains synchronous. To find the access latency with different input address transitions and environmental conditions, we first build a SPICE model at a 45nm technology for a cache similar to that of the level 1 data cache of the Intel Prescott architecture. Motivated by the large difference between the worst and best case latencies and the shape of the distribution curve, we change the cache architecture to allow variable latency accesses. Since the latency of the cache is not known at the time of instruction scheduling, we also modify the functional units with the addition of special queues that will temporarily store the dependent instructions and allow the data to be forwarded from the cache to the functional units correctly. Simulations based on SPEC2000 benchmarks show that our variable access latency cache structure can reduce the execution time by as much as 19.4% and 10.7% on average compared to a conventional cache architecture.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Variable latency caches for nanoscale processor\",\"authors\":\"S. Ozdemir, A. Mallik, J. Ku, G. Memik, Y. Ismail\",\"doi\":\"10.1145/1362622.1362650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache architecture with variable access latency. Particularly, we a) develop a non-uniform access level 1 data-cache, b) study the impact of coupling and physical location on level 1 data cache access latencies, and c) develop and study an architecture where the variable latency cache can be accessed while the rest of the pipeline remains synchronous. To find the access latency with different input address transitions and environmental conditions, we first build a SPICE model at a 45nm technology for a cache similar to that of the level 1 data cache of the Intel Prescott architecture. Motivated by the large difference between the worst and best case latencies and the shape of the distribution curve, we change the cache architecture to allow variable latency accesses. Since the latency of the cache is not known at the time of instruction scheduling, we also modify the functional units with the addition of special queues that will temporarily store the dependent instructions and allow the data to be forwarded from the cache to the functional units correctly. Simulations based on SPEC2000 benchmarks show that our variable access latency cache structure can reduce the execution time by as much as 19.4% and 10.7% on average compared to a conventional cache architecture.\",\"PeriodicalId\":274744,\"journal\":{\"name\":\"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1362622.1362650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1362622.1362650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

可变性是纳米处理器的重要问题之一。由于互连结构在亚微米技术中的重要性日益增加，物理位置和耦合等现象对操作延迟的影响越来越大。因此，对组件的严格访问延迟的传统观点将导致次优架构。在本文中，我们设计了一个可变访问延迟的缓存架构。特别是，我们a)开发一个非统一访问的1级数据缓存，b)研究耦合和物理位置对1级数据缓存访问延迟的影响，以及c)开发和研究一个架构，在该架构中，可变延迟缓存可以被访问，而管道的其余部分保持同步。为了找出不同输入地址转换和环境条件下的访问延迟，我们首先建立了一个45纳米技术的SPICE模型，用于类似于英特尔Prescott架构的1级数据缓存。由于最坏情况和最佳情况延迟之间的巨大差异以及分布曲线的形状，我们更改了缓存架构以允许可变延迟访问。由于在指令调度时不知道缓存的延迟，因此我们还修改了功能单元，添加了特殊队列，这些队列将临时存储相关指令，并允许数据从缓存正确转发到功能单元。基于SPEC2000基准测试的模拟表明，与传统的缓存架构相比，我们的可变访问延迟缓存结构可以平均减少19.4%和10.7%的执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Variable latency caches for nanoscale processor

Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache architecture with variable access latency. Particularly, we a) develop a non-uniform access level 1 data-cache, b) study the impact of coupling and physical location on level 1 data cache access latencies, and c) develop and study an architecture where the variable latency cache can be accessed while the rest of the pipeline remains synchronous. To find the access latency with different input address transitions and environmental conditions, we first build a SPICE model at a 45nm technology for a cache similar to that of the level 1 data cache of the Intel Prescott architecture. Motivated by the large difference between the worst and best case latencies and the shape of the distribution curve, we change the cache architecture to allow variable latency accesses. Since the latency of the cache is not known at the time of instruction scheduling, we also modify the functional units with the addition of special queues that will temporarily store the dependent instructions and allow the data to be forwarded from the cache to the functional units correctly. Simulations based on SPEC2000 benchmarks show that our variable access latency cache structure can reduce the execution time by as much as 19.4% and 10.7% on average compared to a conventional cache architecture.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)

自引率

0.00%

发文量