多核加速器上二级软件缓存一致性的一个案例

2022 IEEE International Workshop on Rapid System Prototyping (RSP) Pub Date : 2022-10-13 DOI:10.1109/RSP57251.2022.10038999

Arthur Vianès, F. Pétrot, F. Rousseau

{"title":"多核加速器上二级软件缓存一致性的一个案例","authors":"Arthur Vianès, F. Pétrot, F. Rousseau","doi":"10.1109/RSP57251.2022.10038999","DOIUrl":null,"url":null,"abstract":"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Case for Second-Level Software Cache Coherency on Many-Core Accelerators\",\"authors\":\"Arthur Vianès, F. Pétrot, F. Rousseau\",\"doi\":\"10.1109/RSP57251.2022.10038999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.\",\"PeriodicalId\":201919,\"journal\":{\"name\":\"2022 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"volume\":\"191 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RSP57251.2022.10038999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RSP57251.2022.10038999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

缓存和缓存一致性是当今高性能计算的主要方面。缓存将数据存储为固定大小的缓存行，缓存一致性协议在固定大小的一致性块上运行，保证了缓存之间的一致性。在这样的系统中，缓存行和一致性块通常大小相同，而且相对较小，通常为64字节。这种大小选择是为通用计算选择的一种权衡:它将错误共享最小化，同时保持较低的缓存维护流量。虚假共享被认为是不必要的缓存一致性流量，它会降低性能。然而，对于专用加速器来说，这种权衡可能不合适:负责缓存一致性的硬件是昂贵的，而且大多数加速器应用程序都没有很好地利用这些硬件，因为通过构造这些应用程序可以最大限度地减少错误共享。本文通过解耦一致性块和缓存行大小，研究了多核加速器的缓存一致性和缓存维护块大小的替代权衡的可能性。本文对其利益、优势和难点进行了阐述和讨论。然后，我们还讨论了原型中软件和硬件修改的需求，以及这些原型评估不同相干块大小的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Case for Second-Level Software Cache Coherency on Many-Core Accelerators

Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Workshop on Rapid System Prototyping (RSP)

自引率

0.00%

发文量