CAESAR: Coherence-Aided Elective and Seamless Alternative Routing via on-chip FPGA

2022 IEEE Real-Time Systems Symposium (RTSS) Pub Date : 2022-12-01 DOI:10.1109/RTSS55097.2022.00038

Shahin Roozkhosh, Denis Hoornaert, R. Mancuso

{"title":"CAESAR: Coherence-Aided Elective and Seamless Alternative Routing via on-chip FPGA","authors":"Shahin Roozkhosh, Denis Hoornaert, R. Mancuso","doi":"10.1109/RTSS55097.2022.00038","DOIUrl":null,"url":null,"abstract":"Prompted by the ever-growing demand for high-performance System-on-Chip (SoC) and the plateauing of CPU frequencies, the SoC design landscape is shifting. In a quest to offer programmable specialization, the adoption of tightly-coupled FPGAs co-located with traditional compute clusters has been embraced by major vendors. This $\\mathbf{CPU}+\\mathbf{FPGA}$ architectural paradigm opens the door to novel hardware/software co-design opportunities. The key principle is that CPU-originated memory traffic can be re-routed through the FPGA for analysis and management purposes. Albeit promising, the side-effect of this approach is that time-critical operations—such as cache-line refills—are fulfilled by moving data over slower interconnects meant for I/O traffic. In this article, we introduce a novel principle named Cache Coherence Backstabbing to precisely tackle these shortcomings. The technique leverages the ability to include the FGPA in the same coherence domain as the core processing elements. Importantly, this enables Coherence-Aided Elective and Seamless Alternative Routing (CAESAR), i.e., seamless inspection and routing of memory transactions, especially cache-line refills, through the FPGA. CAESAR allows the definition of new memory programming paradigms. We discuss the intrinsic potentials of the approach and evaluate it with a full-stack prototype implementation on a commercial platform. Our experiments show an improvement of up to 29% in read bandwidth, 23% in latency, and 13% in pragmatic workloads over the state of the art. Furthermore, we showcase the first in-coherence-domain run-time profiler design as a use-case of the CAESAR approach.","PeriodicalId":202402,"journal":{"name":"2022 IEEE Real-Time Systems Symposium (RTSS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Real-Time Systems Symposium (RTSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RTSS55097.2022.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Prompted by the ever-growing demand for high-performance System-on-Chip (SoC) and the plateauing of CPU frequencies, the SoC design landscape is shifting. In a quest to offer programmable specialization, the adoption of tightly-coupled FPGAs co-located with traditional compute clusters has been embraced by major vendors. This $\mathbf{CPU}+\mathbf{FPGA}$ architectural paradigm opens the door to novel hardware/software co-design opportunities. The key principle is that CPU-originated memory traffic can be re-routed through the FPGA for analysis and management purposes. Albeit promising, the side-effect of this approach is that time-critical operations—such as cache-line refills—are fulfilled by moving data over slower interconnects meant for I/O traffic. In this article, we introduce a novel principle named Cache Coherence Backstabbing to precisely tackle these shortcomings. The technique leverages the ability to include the FGPA in the same coherence domain as the core processing elements. Importantly, this enables Coherence-Aided Elective and Seamless Alternative Routing (CAESAR), i.e., seamless inspection and routing of memory transactions, especially cache-line refills, through the FPGA. CAESAR allows the definition of new memory programming paradigms. We discuss the intrinsic potentials of the approach and evaluate it with a full-stack prototype implementation on a commercial platform. Our experiments show an improvement of up to 29% in read bandwidth, 23% in latency, and 13% in pragmatic workloads over the state of the art. Furthermore, we showcase the first in-coherence-domain run-time profiler design as a use-case of the CAESAR approach.

查看原文本刊更多论文

CAESAR:基于片上FPGA的相干辅助选择和无缝替代路由

由于对高性能片上系统(SoC)的需求不断增长，CPU频率趋于稳定，SoC设计领域正在发生变化。为了提供可编程的专门化，采用与传统计算集群共存的紧密耦合fpga已被主要供应商所接受。这个$\mathbf{CPU}+\mathbf{FPGA}$架构范例为新的硬件/软件协同设计机会打开了大门。关键原理是，cpu发起的内存流量可以通过FPGA重新路由，以进行分析和管理。尽管这种方法很有前景，但它的副作用是，时间关键型操作(如缓存线填充)是通过将数据移动到用于I/O流量的较慢互连来实现的。在本文中，我们介绍了一种新的原理，称为缓存相干后插，以精确地解决这些缺点。该技术利用了将FGPA包含在与核心处理元素相同的相干域中的能力。重要的是，这实现了一致性辅助选择和无缝替代路由(CAESAR)，即通过FPGA无缝检查和路由内存事务，特别是缓存线重新填充。CAESAR允许定义新的内存编程范例。我们讨论了该方法的内在潜力，并通过在商业平台上的全栈原型实现对其进行了评估。我们的实验表明，与目前的技术水平相比，读取带宽提高了29%，延迟提高了23%，实用工作负载提高了13%。此外，我们展示了第一个非相干域运行时分析器设计，作为CAESAR方法的一个用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Real-Time Systems Symposium (RTSS)

自引率

0.00%

发文量