多核加速器上二级软件缓存一致性的一个案例

Arthur Vianès, F. Pétrot, F. Rousseau
{"title":"多核加速器上二级软件缓存一致性的一个案例","authors":"Arthur Vianès, F. Pétrot, F. Rousseau","doi":"10.1109/RSP57251.2022.10038999","DOIUrl":null,"url":null,"abstract":"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Case for Second-Level Software Cache Coherency on Many-Core Accelerators\",\"authors\":\"Arthur Vianès, F. Pétrot, F. Rousseau\",\"doi\":\"10.1109/RSP57251.2022.10038999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.\",\"PeriodicalId\":201919,\"journal\":{\"name\":\"2022 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"volume\":\"191 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RSP57251.2022.10038999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RSP57251.2022.10038999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

缓存和缓存一致性是当今高性能计算的主要方面。缓存将数据存储为固定大小的缓存行,缓存一致性协议在固定大小的一致性块上运行,保证了缓存之间的一致性。在这样的系统中,缓存行和一致性块通常大小相同,而且相对较小,通常为64字节。这种大小选择是为通用计算选择的一种权衡:它将错误共享最小化,同时保持较低的缓存维护流量。虚假共享被认为是不必要的缓存一致性流量,它会降低性能。然而,对于专用加速器来说,这种权衡可能不合适:负责缓存一致性的硬件是昂贵的,而且大多数加速器应用程序都没有很好地利用这些硬件,因为通过构造这些应用程序可以最大限度地减少错误共享。本文通过解耦一致性块和缓存行大小,研究了多核加速器的缓存一致性和缓存维护块大小的替代权衡的可能性。本文对其利益、优势和难点进行了阐述和讨论。然后,我们还讨论了原型中软件和硬件修改的需求,以及这些原型评估不同相干块大小的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Case for Second-Level Software Cache Coherency on Many-Core Accelerators
Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信