{"title":"动态基寄存器缓存的工作负载和实现考虑因素","authors":"M. Farrens, A. Park","doi":"10.1145/123465.123476","DOIUrl":null,"url":null,"abstract":"Dynamic Base Register Caching (DBRC) [. Farrens Park Compression 1990 .] [. Farrens Park SIGARCH18 1991 .] has been shown to be a useful technique for significantly reducing processor to memory address bandwidth. By caching the higher order portions of memory addresses in a set of dynamically allocated base registers, only small register indices need to be transmitted between the processor and memory instead of the high order address bits themselves. In this paper we present the results of trace driven simulations which indicate that DRBC can facilitate the provision of separate paths for instructions and data by reducing the number of address lines required for parallel address channels. In fact, tailoring DBRC for separate instruction and data streams results in superior address compression. We also show that the effectiveness of DBRC is not significantly degraded by multiprogramming workload, for large Spec benchmark traces. Additionally, we suggest two methods to optimize DBRC implementation. (1) A processor’s translation lookaside buffer hardware can be modified to implement DBRC in addition to its normal address translation functions. (2) DBRC latency can be hidden by properly synchronizing it with memory chip address pin multiplexing.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Workload and implementation considerations for dynamic base register caching\",\"authors\":\"M. Farrens, A. Park\",\"doi\":\"10.1145/123465.123476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic Base Register Caching (DBRC) [. Farrens Park Compression 1990 .] [. Farrens Park SIGARCH18 1991 .] has been shown to be a useful technique for significantly reducing processor to memory address bandwidth. By caching the higher order portions of memory addresses in a set of dynamically allocated base registers, only small register indices need to be transmitted between the processor and memory instead of the high order address bits themselves. In this paper we present the results of trace driven simulations which indicate that DRBC can facilitate the provision of separate paths for instructions and data by reducing the number of address lines required for parallel address channels. In fact, tailoring DBRC for separate instruction and data streams results in superior address compression. We also show that the effectiveness of DBRC is not significantly degraded by multiprogramming workload, for large Spec benchmark traces. Additionally, we suggest two methods to optimize DBRC implementation. (1) A processor’s translation lookaside buffer hardware can be modified to implement DBRC in addition to its normal address translation functions. (2) DBRC latency can be hidden by properly synchronizing it with memory chip address pin multiplexing.\",\"PeriodicalId\":118572,\"journal\":{\"name\":\"MICRO 24\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MICRO 24\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/123465.123476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MICRO 24","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/123465.123476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
动态基寄存器缓存(DBRC)。法伦斯公园压缩,1990 . [j]。Farrens Park SIGARCH18 1991 .]已被证明是显著减少处理器到存储器地址带宽的有用技术。通过在一组动态分配的基寄存器中缓存内存地址的高阶部分,处理器和内存之间只需要传输小的寄存器索引,而不需要传输高阶地址位本身。在本文中,我们提出了跟踪驱动仿真的结果,表明DRBC可以通过减少并行地址通道所需的地址行数量来促进为指令和数据提供单独的路径。事实上,为单独的指令和数据流剪裁DBRC会导致优越的地址压缩。我们还表明,对于大型Spec基准跟踪,多编程工作负载不会显著降低DBRC的有效性。此外,我们还提出了两种优化DBRC实施的方法。(1)处理器的翻译暂存缓冲硬件可以被修改,以实现DBRC,除了其正常的地址转换功能。(2)将DBRC延迟与存储芯片地址引脚复用适当同步,可以隐藏DBRC延迟。
Workload and implementation considerations for dynamic base register caching
Dynamic Base Register Caching (DBRC) [. Farrens Park Compression 1990 .] [. Farrens Park SIGARCH18 1991 .] has been shown to be a useful technique for significantly reducing processor to memory address bandwidth. By caching the higher order portions of memory addresses in a set of dynamically allocated base registers, only small register indices need to be transmitted between the processor and memory instead of the high order address bits themselves. In this paper we present the results of trace driven simulations which indicate that DRBC can facilitate the provision of separate paths for instructions and data by reducing the number of address lines required for parallel address channels. In fact, tailoring DBRC for separate instruction and data streams results in superior address compression. We also show that the effectiveness of DBRC is not significantly degraded by multiprogramming workload, for large Spec benchmark traces. Additionally, we suggest two methods to optimize DBRC implementation. (1) A processor’s translation lookaside buffer hardware can be modified to implement DBRC in addition to its normal address translation functions. (2) DBRC latency can be hidden by properly synchronizing it with memory chip address pin multiplexing.