TCP:标签关联预取器

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI:10.1109/HPCA.2003.1183549

Zhigang Hu, M. Martonosi, S. Kaxiras

{"title":"TCP:标签关联预取器","authors":"Zhigang Hu, M. Martonosi, S. Kaxiras","doi":"10.1109/HPCA.2003.1183549","DOIUrl":null,"url":null,"abstract":"Although caches for decades have been the backbone of the memory system, the speed gap between CPU and main memory suggests their augmentation with prefetching mechanisms. Recently, sophisticated hardware correlating prefetching mechanisms have been proposed, in some cases coupled with some form of dead-block prediction. In many proposals, however correlating prefetchers demand a significant investment in hardware. In this paper we show that correlating prefetchers that work with tags instead of cache-line addresses are significantly more resource-efficient, providing equal or better performance than previous proposals. We support this claim by showing that per-set tag sequences exhibit highly repetitive patterns both within a set and across different sets. Because a single tag sequence can capture multiple address sequences spread over different cache sets, significant space savings can be achieved. We propose a tag-based prefetcher called a tag correlating prefetcher (TCP). Even with very small history tables, TCP outperforms address-based correlating prefetchers many times larger. In addition, we show that such a prefetcher can yield most of its performance benefits if placed at the L2 level of an aggressive out-of-order processor. Only if one wants prefetching all the way up to L1, is dead-block prediction required. Finally, we draw parallels between the two-level structure of TCP and similar structures for branch prediction mechanisms; these parallels raise interesting opportunities for improving correlating memory prefetchers by harnessing lessons already learned for correlating branch predictors.","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"97","resultStr":"{\"title\":\"TCP: tag correlating prefetchers\",\"authors\":\"Zhigang Hu, M. Martonosi, S. Kaxiras\",\"doi\":\"10.1109/HPCA.2003.1183549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although caches for decades have been the backbone of the memory system, the speed gap between CPU and main memory suggests their augmentation with prefetching mechanisms. Recently, sophisticated hardware correlating prefetching mechanisms have been proposed, in some cases coupled with some form of dead-block prediction. In many proposals, however correlating prefetchers demand a significant investment in hardware. In this paper we show that correlating prefetchers that work with tags instead of cache-line addresses are significantly more resource-efficient, providing equal or better performance than previous proposals. We support this claim by showing that per-set tag sequences exhibit highly repetitive patterns both within a set and across different sets. Because a single tag sequence can capture multiple address sequences spread over different cache sets, significant space savings can be achieved. We propose a tag-based prefetcher called a tag correlating prefetcher (TCP). Even with very small history tables, TCP outperforms address-based correlating prefetchers many times larger. In addition, we show that such a prefetcher can yield most of its performance benefits if placed at the L2 level of an aggressive out-of-order processor. Only if one wants prefetching all the way up to L1, is dead-block prediction required. Finally, we draw parallels between the two-level structure of TCP and similar structures for branch prediction mechanisms; these parallels raise interesting opportunities for improving correlating memory prefetchers by harnessing lessons already learned for correlating branch predictors.\",\"PeriodicalId\":150992,\"journal\":{\"name\":\"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"97\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2003.1183549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2003.1183549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 97

摘要

虽然缓存几十年来一直是内存系统的支柱，但CPU和主存之间的速度差距表明，它们可以通过预取机制得到增强。最近，已经提出了复杂的硬件相关预取机制，在某些情况下与某种形式的死块预测相结合。然而，在许多建议中，关联预取器需要在硬件上进行大量投资。在本文中，我们展示了与标签而不是缓存行地址一起工作的关联预取器明显更节约资源，提供与以前的建议相同或更好的性能。我们通过显示每个集合标签序列在一个集合内和跨不同集合表现出高度重复的模式来支持这一说法。由于单个标记序列可以捕获分布在不同缓存集中的多个地址序列，因此可以实现显著的空间节省。我们提出了一种基于标签的预取器，称为标签相关预取器(TCP)。即使使用非常小的历史表，TCP的性能也优于基于地址的关联预取器。此外，我们还表明，如果将这样的预取器放在主动乱序处理器的L2级，则可以获得大部分性能优势。只有当想要一直预取到L1时，才需要死块预测。最后，我们将TCP的两层结构与分支预测机制的类似结构进行了类比;这些相似之处为改进相关内存预取器提供了有趣的机会，可以利用已经从相关分支预测器中学到的经验教训。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TCP: tag correlating prefetchers

Although caches for decades have been the backbone of the memory system, the speed gap between CPU and main memory suggests their augmentation with prefetching mechanisms. Recently, sophisticated hardware correlating prefetching mechanisms have been proposed, in some cases coupled with some form of dead-block prediction. In many proposals, however correlating prefetchers demand a significant investment in hardware. In this paper we show that correlating prefetchers that work with tags instead of cache-line addresses are significantly more resource-efficient, providing equal or better performance than previous proposals. We support this claim by showing that per-set tag sequences exhibit highly repetitive patterns both within a set and across different sets. Because a single tag sequence can capture multiple address sequences spread over different cache sets, significant space savings can be achieved. We propose a tag-based prefetcher called a tag correlating prefetcher (TCP). Even with very small history tables, TCP outperforms address-based correlating prefetchers many times larger. In addition, we show that such a prefetcher can yield most of its performance benefits if placed at the L2 level of an aggressive out-of-order processor. Only if one wants prefetching all the way up to L1, is dead-block prediction required. Finally, we draw parallels between the two-level structure of TCP and similar structures for branch prediction mechanisms; these parallels raise interesting opportunities for improving correlating memory prefetchers by harnessing lessons already learned for correlating branch predictors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.

自引率

0.00%

发文量