DELTA:基于磁贴的多处理器的分布式位置感知缓存分区

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI:10.1109/IPDPS47924.2020.00066

N. Holtryd, M. Manivannan, P. Stenström, M. Pericàs

{"title":"DELTA:基于磁贴的多处理器的分布式位置感知缓存分区","authors":"N. Holtryd, M. Manivannan, P. Stenström, M. Pericàs","doi":"10.1109/IPDPS47924.2020.00066","DOIUrl":null,"url":null,"abstract":"Cache partitioning in tile-based CMP architectures is a challenging problem because of i) the need to determine capacity allocations with low computational overhead and ii) the need to place allocations close to where they are used, in order to reduce access latency. Although, previous solutions have addressed the problem of reducing the computational overhead and incorporating locality-awareness, they suffer from the overheads of centrally determining allocations.In this paper, we propose DELTA, a novel distributed and locality-aware cache partitioning solution which works by exchanging asynchronous challenges among cores. The distributed nature of the algorithm coupled with the low computational complexity allows for frequent reconfigurations at negligible cost and for the scheme to be implemented directly in hardware. The allocation algorithm is supported by an enforcement mechanism which enables locality-aware placement of data. We evaluate DELTA on 16- and 64-core tiled CMPs with multi-programmed workloads. Our evaluation shows that DELTA improves performance by 9% and 16%, respectively, on average, compared to an unpartitioned shared last-level cache.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"48 1","pages":"578-589"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors\",\"authors\":\"N. Holtryd, M. Manivannan, P. Stenström, M. Pericàs\",\"doi\":\"10.1109/IPDPS47924.2020.00066\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cache partitioning in tile-based CMP architectures is a challenging problem because of i) the need to determine capacity allocations with low computational overhead and ii) the need to place allocations close to where they are used, in order to reduce access latency. Although, previous solutions have addressed the problem of reducing the computational overhead and incorporating locality-awareness, they suffer from the overheads of centrally determining allocations.In this paper, we propose DELTA, a novel distributed and locality-aware cache partitioning solution which works by exchanging asynchronous challenges among cores. The distributed nature of the algorithm coupled with the low computational complexity allows for frequent reconfigurations at negligible cost and for the scheme to be implemented directly in hardware. The allocation algorithm is supported by an enforcement mechanism which enables locality-aware placement of data. We evaluate DELTA on 16- and 64-core tiled CMPs with multi-programmed workloads. Our evaluation shows that DELTA improves performance by 9% and 16%, respectively, on average, compared to an unpartitioned shared last-level cache.\",\"PeriodicalId\":6805,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"48 1\",\"pages\":\"578-589\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS47924.2020.00066\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

基于tile的CMP体系结构中的缓存分区是一个具有挑战性的问题，因为i)需要以较低的计算开销确定容量分配，ii)需要将分配放置在使用它们的地方附近，以减少访问延迟。尽管以前的解决方案已经解决了减少计算开销和结合位置感知的问题，但它们受到集中确定分配的开销的影响。在本文中，我们提出了DELTA，一种新的分布式和位置感知的缓存分区解决方案，它通过在内核之间交换异步挑战来工作。该算法的分布式特性与较低的计算复杂度相结合，允许以可忽略不计的成本进行频繁的重新配置，并且该方案可以直接在硬件中实现。分配算法由一种强制机制支持，该机制支持数据的位置感知放置。我们在具有多编程工作负载的16核和64核平铺cmp上评估DELTA。我们的评估表明，与未分区的共享最后一级缓存相比，DELTA平均分别提高了9%和16%的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors

Cache partitioning in tile-based CMP architectures is a challenging problem because of i) the need to determine capacity allocations with low computational overhead and ii) the need to place allocations close to where they are used, in order to reduce access latency. Although, previous solutions have addressed the problem of reducing the computational overhead and incorporating locality-awareness, they suffer from the overheads of centrally determining allocations.In this paper, we propose DELTA, a novel distributed and locality-aware cache partitioning solution which works by exchanging asynchronous challenges among cores. The distributed nature of the algorithm coupled with the low computational complexity allows for frequent reconfigurations at negligible cost and for the scheme to be implemented directly in hardware. The allocation algorithm is supported by an enforcement mechanism which enables locality-aware placement of data. We evaluate DELTA on 16- and 64-core tiled CMPs with multi-programmed workloads. Our evaluation shows that DELTA improves performance by 9% and 16%, respectively, on average, compared to an unpartitioned shared last-level cache.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量