混合虚拟缓存的高效同义词过滤和可伸缩延迟翻译

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2016-01-01 DOI:10.1145/3007787.3001160

Chang Hyun Park, Taekyung Heo, Jaehyuk Huh

{"title":"混合虚拟缓存的高效同义词过滤和可伸缩延迟翻译","authors":"Chang Hyun Park, Taekyung Heo, Jaehyuk Huh","doi":"10.1145/3007787.3001160","DOIUrl":null,"url":null,"abstract":"Conventional translation look-aside buffers(TLBs) are required to complete address translation withshort latencies, as the address translation is on the criticalpath of all memory accesses even for L1 cache hits. Such strictTLB latency restrictions limit the TLB capacity, as the latencyincrease with large TLBs may lower the overall performanceeven with potential TLB miss reductions. Furthermore, TLBsconsume a significant amount of energy as they are accessedfor every instruction fetch and data access. To avoid thelatency restriction and reduce the energy consumption, virtualcaching techniques have been proposed to defer translation toafter L1 cache misses. However, an efficient solution for thesynonym problem has been a critical issue hindering the wideadoption of virtual caching.Based on the virtual caching concept, this study proposes ahybrid virtual memory architecture extending virtual cachingto the entire cache hierarchy, aiming to improve both performanceand energy consumption. The hybrid virtual cachinguses virtual addresses augmented with address space identifiers(ASID) in the cache hierarchy for common non-synonymaddresses. For such non-synonyms, the address translationoccurs only after last-level cache (LLC) misses. For uncommonsynonym addresses, the addresses are translated to physicaladdresses with conventional TLBs before L1 cache accesses. Tosupport such hybrid translation, we propose an efficient synonymdetection mechanism based on Bloom filters which canidentify synonym candidates with few false positives. For largememory applications, delayed translation alone cannot solvethe address translation problem, as fixed-granularity delayedTLBs may not scale with the increasing memory requirements.To mitigate the translation scalability problem, this studyproposes a delayed many segment translation designed for thehybrid virtual caching. The experimental results show that ourapproach effectively lowers accesses to the TLBs, leading tosignificant power savings. In addition, the approach providesperformance improvement with scalable delayed translationwith variable length segments.","PeriodicalId":6634,"journal":{"name":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","volume":"97 1","pages":"217-229"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Efficient synonym filtering and scalable delayed translation for hybrid virtual caching\",\"authors\":\"Chang Hyun Park, Taekyung Heo, Jaehyuk Huh\",\"doi\":\"10.1145/3007787.3001160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conventional translation look-aside buffers(TLBs) are required to complete address translation withshort latencies, as the address translation is on the criticalpath of all memory accesses even for L1 cache hits. Such strictTLB latency restrictions limit the TLB capacity, as the latencyincrease with large TLBs may lower the overall performanceeven with potential TLB miss reductions. Furthermore, TLBsconsume a significant amount of energy as they are accessedfor every instruction fetch and data access. To avoid thelatency restriction and reduce the energy consumption, virtualcaching techniques have been proposed to defer translation toafter L1 cache misses. However, an efficient solution for thesynonym problem has been a critical issue hindering the wideadoption of virtual caching.Based on the virtual caching concept, this study proposes ahybrid virtual memory architecture extending virtual cachingto the entire cache hierarchy, aiming to improve both performanceand energy consumption. The hybrid virtual cachinguses virtual addresses augmented with address space identifiers(ASID) in the cache hierarchy for common non-synonymaddresses. For such non-synonyms, the address translationoccurs only after last-level cache (LLC) misses. For uncommonsynonym addresses, the addresses are translated to physicaladdresses with conventional TLBs before L1 cache accesses. Tosupport such hybrid translation, we propose an efficient synonymdetection mechanism based on Bloom filters which canidentify synonym candidates with few false positives. For largememory applications, delayed translation alone cannot solvethe address translation problem, as fixed-granularity delayedTLBs may not scale with the increasing memory requirements.To mitigate the translation scalability problem, this studyproposes a delayed many segment translation designed for thehybrid virtual caching. The experimental results show that ourapproach effectively lowers accesses to the TLBs, leading tosignificant power savings. In addition, the approach providesperformance improvement with scalable delayed translationwith variable length segments.\",\"PeriodicalId\":6634,\"journal\":{\"name\":\"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"97 1\",\"pages\":\"217-229\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3007787.3001160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3007787.3001160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

传统的转换暂置缓冲区(tlb)需要以较短的延迟完成地址转换，因为地址转换位于所有内存访问的关键路径上，即使是L1缓存命中。这种严格的lb延迟限制限制了TLB容量，因为大TLB的延迟增加可能会降低整体性能，即使潜在的TLB丢失减少。此外，tlb消耗了大量的能量，因为每次指令读取和数据访问都要访问它们。为了避免延迟限制和减少能量消耗，虚拟缓存技术被提出将转换延迟到L1缓存丢失之后。然而，如何有效地解决同义问题一直是阻碍虚拟缓存广泛应用的关键问题。基于虚拟缓存的概念，本研究提出了一种将虚拟缓存扩展到整个缓存层次的混合虚拟内存架构，旨在提高性能和能耗。混合虚拟缓存在缓存层次结构中使用带有地址空间标识符(ASID)的虚拟地址，用于常见的非同义地址。对于这种非同义词，地址转换只在最后一级缓存(LLC)丢失后发生。对于非常见的同义词地址，在L1缓存访问之前，这些地址被用传统的tlb转换为物理地址。为了支持这种混合翻译，我们提出了一种高效的基于Bloom过滤器的同义词检测机制，该机制可以在很少误报的情况下识别候选同义词。对于大内存应用程序，延迟转换本身不能解决地址转换问题，因为固定粒度的delayedtlb可能无法随着内存需求的增加而扩展。为了缓解翻译的可扩展性问题，本研究提出了一种针对混合虚拟缓存的延迟多段翻译。实验结果表明，我们的方法有效地降低了对tlb的访问，从而显著节省了功耗。此外，该方法还提供了可变长度段的可伸缩延迟平移的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient synonym filtering and scalable delayed translation for hybrid virtual caching

Conventional translation look-aside buffers(TLBs) are required to complete address translation withshort latencies, as the address translation is on the criticalpath of all memory accesses even for L1 cache hits. Such strictTLB latency restrictions limit the TLB capacity, as the latencyincrease with large TLBs may lower the overall performanceeven with potential TLB miss reductions. Furthermore, TLBsconsume a significant amount of energy as they are accessedfor every instruction fetch and data access. To avoid thelatency restriction and reduce the energy consumption, virtualcaching techniques have been proposed to defer translation toafter L1 cache misses. However, an efficient solution for thesynonym problem has been a critical issue hindering the wideadoption of virtual caching.Based on the virtual caching concept, this study proposes ahybrid virtual memory architecture extending virtual cachingto the entire cache hierarchy, aiming to improve both performanceand energy consumption. The hybrid virtual cachinguses virtual addresses augmented with address space identifiers(ASID) in the cache hierarchy for common non-synonymaddresses. For such non-synonyms, the address translationoccurs only after last-level cache (LLC) misses. For uncommonsynonym addresses, the addresses are translated to physicaladdresses with conventional TLBs before L1 cache accesses. Tosupport such hybrid translation, we propose an efficient synonymdetection mechanism based on Bloom filters which canidentify synonym candidates with few false positives. For largememory applications, delayed translation alone cannot solvethe address translation problem, as fixed-granularity delayedTLBs may not scale with the increasing memory requirements.To mitigate the translation scalability problem, this studyproposes a delayed many segment translation designed for thehybrid virtual caching. The experimental results show that ourapproach effectively lowers accesses to the TLBs, leading tosignificant power savings. In addition, the approach providesperformance improvement with scalable delayed translationwith variable length segments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量