在缓存替换算法中结合成本敏感性和局部性原则提高缓存性能

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI:10.1109/ICCD.2010.5647594

Rami Sheikh, Mazen Kharbutli

{"title":"在缓存替换算法中结合成本敏感性和局部性原则提高缓存性能","authors":"Rami Sheikh, Mazen Kharbutli","doi":"10.1109/ICCD.2010.5647594","DOIUrl":null,"url":null,"abstract":"Due to the ever increasing performance gap between the processor and the main memory, it becomes crucial to bridge that gap by designing an efficient memory hierarchy capable of reducing the average memory access time. The cache replacement algorithm plays a central role in designing an efficient memory hierarchy. Many of the recent studies in cache replacement algorithms have focused on improving L2 cache replacement algorithms by minimizing the miss count. However, depending on the dependency chain, cache miss bursts, and other factors, a processor's ability to partially hide the cost of an L2 cache miss varies; that is, cache miss costs are not uniform. Therefore, a better solution would account also for the aggregate miss cost in designing cache replacement algorithms. Our proposed solution combines the two principles of locality and cost-sensitivity into one which we call: LACS: Locality-Aware Cost-Sensitive cache replacement algorithm. LACS estimates a cache block's cost from the number of instructions the processor manages to issue during a cache miss on that block and then victimizes cache blocks with low cost and poor locality in order to maximize the overall cache performance. When LACS is evaluated using a uniprocessor architecture model, it speeds up 10 L2 cache performance-constrained SPEC CPU2000 benchmarks by up to 85% and 15% on average while not slowing down any of the 20 SPEC CPU2000 benchmarks evaluated. When evaluated using a dual-core CMP architecture model, LACS speeds up 6 SPEC CPU2000 benchmark pairs by up to 44% and 11% on average.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms\",\"authors\":\"Rami Sheikh, Mazen Kharbutli\",\"doi\":\"10.1109/ICCD.2010.5647594\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the ever increasing performance gap between the processor and the main memory, it becomes crucial to bridge that gap by designing an efficient memory hierarchy capable of reducing the average memory access time. The cache replacement algorithm plays a central role in designing an efficient memory hierarchy. Many of the recent studies in cache replacement algorithms have focused on improving L2 cache replacement algorithms by minimizing the miss count. However, depending on the dependency chain, cache miss bursts, and other factors, a processor's ability to partially hide the cost of an L2 cache miss varies; that is, cache miss costs are not uniform. Therefore, a better solution would account also for the aggregate miss cost in designing cache replacement algorithms. Our proposed solution combines the two principles of locality and cost-sensitivity into one which we call: LACS: Locality-Aware Cost-Sensitive cache replacement algorithm. LACS estimates a cache block's cost from the number of instructions the processor manages to issue during a cache miss on that block and then victimizes cache blocks with low cost and poor locality in order to maximize the overall cache performance. When LACS is evaluated using a uniprocessor architecture model, it speeds up 10 L2 cache performance-constrained SPEC CPU2000 benchmarks by up to 85% and 15% on average while not slowing down any of the 20 SPEC CPU2000 benchmarks evaluated. When evaluated using a dual-core CMP architecture model, LACS speeds up 6 SPEC CPU2000 benchmark pairs by up to 44% and 11% on average.\",\"PeriodicalId\":182350,\"journal\":{\"name\":\"2010 IEEE International Conference on Computer Design\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2010.5647594\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2010.5647594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

由于处理器和主存储器之间的性能差距越来越大，通过设计能够减少平均内存访问时间的有效内存层次结构来弥合这一差距变得至关重要。缓存替换算法在设计高效的内存层次结构中起着核心作用。最近关于缓存替换算法的许多研究都集中在通过最小化缺失计数来改进L2缓存替换算法。然而，根据依赖链、缓存缺失突发和其他因素，处理器部分隐藏二级缓存缺失成本的能力是不同的;也就是说，缓存丢失代价不是统一的。因此，在设计缓存替换算法时，一个更好的解决方案应该考虑总缺失代价。我们提出的解决方案将局域性和成本敏感性两个原则结合为一个，我们称之为LACS:局域感知成本敏感缓存替换算法。LACS根据处理器在缓存丢失期间设法发出的指令数量来估计缓存块的成本，然后选择成本低且局部性差的缓存块，以最大限度地提高整体缓存性能。当使用单处理器架构模型评估LACS时，它使10个L2缓存性能受限的SPEC CPU2000基准测试的平均速度提高了85%和15%，而所评估的20个SPEC CPU2000基准测试中的任何一个都没有减慢。当使用双核CMP架构模型进行评估时，LACS平均可将6对SPEC CPU2000基准对的速度提高44%和11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms

Due to the ever increasing performance gap between the processor and the main memory, it becomes crucial to bridge that gap by designing an efficient memory hierarchy capable of reducing the average memory access time. The cache replacement algorithm plays a central role in designing an efficient memory hierarchy. Many of the recent studies in cache replacement algorithms have focused on improving L2 cache replacement algorithms by minimizing the miss count. However, depending on the dependency chain, cache miss bursts, and other factors, a processor's ability to partially hide the cost of an L2 cache miss varies; that is, cache miss costs are not uniform. Therefore, a better solution would account also for the aggregate miss cost in designing cache replacement algorithms. Our proposed solution combines the two principles of locality and cost-sensitivity into one which we call: LACS: Locality-Aware Cost-Sensitive cache replacement algorithm. LACS estimates a cache block's cost from the number of instructions the processor manages to issue during a cache miss on that block and then victimizes cache blocks with low cost and poor locality in order to maximize the overall cache performance. When LACS is evaluated using a uniprocessor architecture model, it speeds up 10 L2 cache performance-constrained SPEC CPU2000 benchmarks by up to 85% and 15% on average while not slowing down any of the 20 SPEC CPU2000 benchmarks evaluated. When evaluated using a dual-core CMP architecture model, LACS speeds up 6 SPEC CPU2000 benchmark pairs by up to 44% and 11% on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Conference on Computer Design

自引率

0.00%

发文量