理解英特尔处理器上的动态缓存:方法和应用

2014 12th IEEE International Conference on Embedded and Ubiquitous Computing Pub Date : 2014-08-26 DOI:10.1109/EUC.2014.18

Yi Zhang, Nan Guan, W. Yi

{"title":"理解英特尔处理器上的动态缓存:方法和应用","authors":"Yi Zhang, Nan Guan, W. Yi","doi":"10.1109/EUC.2014.18","DOIUrl":null,"url":null,"abstract":"The design and implementation of caches on a given platform has significant impacts to many areas in computer system design. On chip-multiprocessors (CMP), new cache architectures are proposed to meet the rapidly increasing performance requirements. However, the cache architectures are usually not well-documented for commercial processors. This raises difficulties for people to precisely understand the working principle of many components of the processors, not only the cache itself, but also the related components like the whole memory subsystem. This paper aims at disclosing the working principle of the last level cache of Intel Ivy Bridge processors. First, we identify the address translation logic on this cache. Second, we disclose the replacement policy of the cache. This is a dynamic insertion replacement policy, which is very different from the widely used LRU policy and its variants. Although this replacement policy has been proposed in academic literatures, our work is the first one showing it is actually used in commercial processors. To show the significance of our discovery, we design a methodology to generate controllable cache miss sequences under this new cache, and apply it to the design of a benchmark to model the memory concurrency. Evaluations on physical machines are conducted to show the effectiveness of the proposed method.","PeriodicalId":331736,"journal":{"name":"2014 12th IEEE International Conference on Embedded and Ubiquitous Computing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Understanding the Dynamic Caches on Intel Processors: Methods and Applications\",\"authors\":\"Yi Zhang, Nan Guan, W. Yi\",\"doi\":\"10.1109/EUC.2014.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The design and implementation of caches on a given platform has significant impacts to many areas in computer system design. On chip-multiprocessors (CMP), new cache architectures are proposed to meet the rapidly increasing performance requirements. However, the cache architectures are usually not well-documented for commercial processors. This raises difficulties for people to precisely understand the working principle of many components of the processors, not only the cache itself, but also the related components like the whole memory subsystem. This paper aims at disclosing the working principle of the last level cache of Intel Ivy Bridge processors. First, we identify the address translation logic on this cache. Second, we disclose the replacement policy of the cache. This is a dynamic insertion replacement policy, which is very different from the widely used LRU policy and its variants. Although this replacement policy has been proposed in academic literatures, our work is the first one showing it is actually used in commercial processors. To show the significance of our discovery, we design a methodology to generate controllable cache miss sequences under this new cache, and apply it to the design of a benchmark to model the memory concurrency. Evaluations on physical machines are conducted to show the effectiveness of the proposed method.\",\"PeriodicalId\":331736,\"journal\":{\"name\":\"2014 12th IEEE International Conference on Embedded and Ubiquitous Computing\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 12th IEEE International Conference on Embedded and Ubiquitous Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUC.2014.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 12th IEEE International Conference on Embedded and Ubiquitous Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUC.2014.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在给定平台上缓存的设计和实现对计算机系统设计的许多领域都有重大影响。在芯片多处理器(CMP)上，为了满足快速增长的性能要求，提出了新的缓存架构。然而，对于商业处理器来说，缓存架构通常没有很好的文档。这给人们准确理解处理器的许多组件的工作原理带来了困难，不仅是缓存本身，还有整个内存子系统等相关组件。本文旨在揭示英特尔长春藤桥处理器最后一级缓存的工作原理。首先，我们确定此缓存上的地址转换逻辑。其次，我们披露了缓存的替换策略。这是一种动态插入替换策略，与广泛使用的LRU策略及其变体有很大不同。虽然这种替换策略已经在学术文献中提出，但我们的工作是第一个显示它实际上在商业处理器中使用的研究。为了显示我们的发现的意义，我们设计了一种方法来在这种新的缓存下生成可控的缓存缺失序列，并将其应用于设计一个基准来建模内存并发性。在物理机器上进行了评估，以证明所提出方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Understanding the Dynamic Caches on Intel Processors: Methods and Applications

The design and implementation of caches on a given platform has significant impacts to many areas in computer system design. On chip-multiprocessors (CMP), new cache architectures are proposed to meet the rapidly increasing performance requirements. However, the cache architectures are usually not well-documented for commercial processors. This raises difficulties for people to precisely understand the working principle of many components of the processors, not only the cache itself, but also the related components like the whole memory subsystem. This paper aims at disclosing the working principle of the last level cache of Intel Ivy Bridge processors. First, we identify the address translation logic on this cache. Second, we disclose the replacement policy of the cache. This is a dynamic insertion replacement policy, which is very different from the widely used LRU policy and its variants. Although this replacement policy has been proposed in academic literatures, our work is the first one showing it is actually used in commercial processors. To show the significance of our discovery, we design a methodology to generate controllable cache miss sequences under this new cache, and apply it to the design of a benchmark to model the memory concurrency. Evaluations on physical machines are conducted to show the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 12th IEEE International Conference on Embedded and Ubiquitous Computing

自引率

0.00%

发文量