用于安全关键型系统的高性能和可预测共享末级高速缓存

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS

ACS Applied Bio Materials Pub Date : 2024-08-08 DOI:10.1145/3687308

Zhuanhao Wu, A. Kaushik, Hiren D. Patel

{"title":"用于安全关键型系统的高性能和可预测共享末级高速缓存","authors":"Zhuanhao Wu, A. Kaushik, Hiren D. Patel","doi":"10.1145/3687308","DOIUrl":null,"url":null,"abstract":"We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared to a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC shows average-case performance speedups of 1.89 ×, 3.36 ×, and 6.24 × compared to the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared to the original ZCLLC that does not have any optimizations, ZCLLC shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared to ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"8 5","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems\",\"authors\":\"Zhuanhao Wu, A. Kaushik, Hiren D. Patel\",\"doi\":\"10.1145/3687308\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared to a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC shows average-case performance speedups of 1.89 ×, 3.36 ×, and 6.24 × compared to the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared to the original ZCLLC that does not have any optimizations, ZCLLC shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared to ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":\"8 5\",\"pages\":\"\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3687308\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了零成本末级高速缓存（ZeroCost-LLC，ZCLLC），这是一种新型共享包容性末级高速缓存（LLC）设计，适用于时序可预测的多核平台，与传统的共享包容性LLC设计相比，它能提供更低的最坏情况延迟（WCL）。ZCLLC 通过消除高速缓存层次结构中高速缓存行失效形式的某些内存操作来实现低 WCL，这种失效是内核内存请求在高速缓存层次结构中未命中以及 LLC 中没有空闲条目来容纳该请求所获取数据的结果。除了低 WCL 外，ZCLLC 还能以额外缓存容量的形式提供性能优势，而且与最先进的方法不同，ZCLLC 不会对其在多个内核中的使用施加任何限制。在这项工作中，我们描述了 LLC 缓存行失效对 WCL 的影响，并系统地构建了消除这些失效的解决方案，从而实现了 ZCLLC。我们还介绍了 ZCLLC，它是 ZCLLC 的优化变体，与 ZCLLC 相比，WCL 更低，平均性能更高。我们对共享总线仲裁机制进行了优化，并扩展了 ZCLLC 的微体系结构，允许向主存储器发出重叠内存请求。我们的分析表明，对于 2 核、4 核和 8 核，ZCLLC 下内存请求的分析 WCL 分别比最先进的 LLC 分区共享技术低 87.0%、93.8% 和 97.1%。对于 2 核、4 核和 8 核，ZCLLC 与最先进的 LLC 分区共享技术相比，平均性能分别提高了 1.89 倍、3.36 倍和 6.24 倍。与未做任何优化的原始 ZCLLC 相比，ZCLLC 的分析 WCL 更低，2、4 和 8 核的分析 WCL 分别比 ZCLLC-NORMAL 低 76.5%、82.6% 和 86.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems

We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared to a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC shows average-case performance speedups of 1.89 ×, 3.36 ×, and 6.24 × compared to the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared to the original ZCLLC that does not have any optimizations, ZCLLC shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared to ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Applied Bio Materials Chemistry-Chemistry (all)

CiteScore

9.40

自引率

2.10%

发文量

464

期刊介绍： ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.