{"title":"多核加速器上二级软件缓存一致性的一个案例","authors":"Arthur Vianès, F. Pétrot, F. Rousseau","doi":"10.1109/RSP57251.2022.10038999","DOIUrl":null,"url":null,"abstract":"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Case for Second-Level Software Cache Coherency on Many-Core Accelerators\",\"authors\":\"Arthur Vianès, F. Pétrot, F. Rousseau\",\"doi\":\"10.1109/RSP57251.2022.10038999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.\",\"PeriodicalId\":201919,\"journal\":{\"name\":\"2022 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"volume\":\"191 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RSP57251.2022.10038999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RSP57251.2022.10038999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Case for Second-Level Software Cache Coherency on Many-Core Accelerators
Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.