High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems Pub Date : 2024-08-08 DOI:10.1145/3687308

Zhuanhao Wu, A. Kaushik, Hiren D. Patel

{"title":"High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems","authors":"Zhuanhao Wu, A. Kaushik, Hiren D. Patel","doi":"10.1145/3687308","DOIUrl":null,"url":null,"abstract":"We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared to a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC shows average-case performance speedups of 1.89 ×, 3.36 ×, and 6.24 × compared to the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared to the original ZCLLC that does not have any optimizations, ZCLLC shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared to ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687308","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared to a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC shows average-case performance speedups of 1.89 ×, 3.36 ×, and 6.24 × compared to the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared to the original ZCLLC that does not have any optimizations, ZCLLC shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared to ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.

查看原文本刊更多论文

用于安全关键型系统的高性能和可预测共享末级高速缓存

我们提出了零成本末级高速缓存（ZeroCost-LLC，ZCLLC），这是一种新型共享包容性末级高速缓存（LLC）设计，适用于时序可预测的多核平台，与传统的共享包容性LLC设计相比，它能提供更低的最坏情况延迟（WCL）。ZCLLC 通过消除高速缓存层次结构中高速缓存行失效形式的某些内存操作来实现低 WCL，这种失效是内核内存请求在高速缓存层次结构中未命中以及 LLC 中没有空闲条目来容纳该请求所获取数据的结果。除了低 WCL 外，ZCLLC 还能以额外缓存容量的形式提供性能优势，而且与最先进的方法不同，ZCLLC 不会对其在多个内核中的使用施加任何限制。在这项工作中，我们描述了 LLC 缓存行失效对 WCL 的影响，并系统地构建了消除这些失效的解决方案，从而实现了 ZCLLC。我们还介绍了 ZCLLC，它是 ZCLLC 的优化变体，与 ZCLLC 相比，WCL 更低，平均性能更高。我们对共享总线仲裁机制进行了优化，并扩展了 ZCLLC 的微体系结构，允许向主存储器发出重叠内存请求。我们的分析表明，对于 2 核、4 核和 8 核，ZCLLC 下内存请求的分析 WCL 分别比最先进的 LLC 分区共享技术低 87.0%、93.8% 和 97.1%。对于 2 核、4 核和 8 核，ZCLLC 与最先进的 LLC 分区共享技术相比，平均性能分别提高了 1.89 倍、3.36 倍和 6.24 倍。与未做任何优化的原始 ZCLLC 相比，ZCLLC 的分析 WCL 更低，2、4 和 8 核的分析 WCL 分别比 ZCLLC-NORMAL 低 76.5%、82.6% 和 86.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.