仔细研究商用多核soc中的内存干扰效应

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-07-04 DOI:10.1016/j.sysarc.2025.103487

Lorenzo Carletti , Andrea Serafini , Gianluca Brilli , Alessandro Capotondi , Alessandro Biasci , Paolo Valente , Andrea Marongiu

{"title":"仔细研究商用多核soc中的内存干扰效应","authors":"Lorenzo Carletti , Andrea Serafini , Gianluca Brilli , Alessandro Capotondi , Alessandro Biasci , Paolo Valente , Andrea Marongiu","doi":"10.1016/j.sysarc.2025.103487","DOIUrl":null,"url":null,"abstract":"<div><div>Commercial-off-the-shelf (COTS) multicore systems on chip (SoC) represent a cheap and convenient solution for deploying sophisticated workloads in various application domains. The combination of several CPU cores and dedicated acceleration units tightly sharing memory and interconnect systems can provide tremendous peak performance, but also threatens timing predictability due to memory interference. Even when focusing on main CPU cores only, it has been reported that task slowdown due to memory interference can surpass 10<span><math><mo>×</mo></math></span>. Such poorly predictable timing behaviors bar greater adoption of COTS multicore SoCs in the domain of timing-critical applications, and motivate the wide activity of the research community to study solutions aimed at mitigating the problem. Understanding worst-case interference patterns on such hardware platforms is fundamental for building any effective memory interference control mechanism. A common assumption in the literature is that worst-case interference is generated by (and therefore assessed through) read-intensive synthetic workloads with 100% cache miss rate. Yet certain real-life workloads exhibit worse slowdown than what is generated under said assumed worst-case, so we study the interference effects of both synthetic and real-life benchmarks on different multicore SoCs. Our experiments indicate that cache thrashing causes the worst interference experienced by real-life benchmarks – due to their different usage of caches – and that there is no universal worst-case workload for every platform.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"167 ","pages":"Article 103487"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Taking a closer look at memory interference effects in commercial-off-the-shelf multicore SoCs\",\"authors\":\"Lorenzo Carletti , Andrea Serafini , Gianluca Brilli , Alessandro Capotondi , Alessandro Biasci , Paolo Valente , Andrea Marongiu\",\"doi\":\"10.1016/j.sysarc.2025.103487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Commercial-off-the-shelf (COTS) multicore systems on chip (SoC) represent a cheap and convenient solution for deploying sophisticated workloads in various application domains. The combination of several CPU cores and dedicated acceleration units tightly sharing memory and interconnect systems can provide tremendous peak performance, but also threatens timing predictability due to memory interference. Even when focusing on main CPU cores only, it has been reported that task slowdown due to memory interference can surpass 10<span><math><mo>×</mo></math></span>. Such poorly predictable timing behaviors bar greater adoption of COTS multicore SoCs in the domain of timing-critical applications, and motivate the wide activity of the research community to study solutions aimed at mitigating the problem. Understanding worst-case interference patterns on such hardware platforms is fundamental for building any effective memory interference control mechanism. A common assumption in the literature is that worst-case interference is generated by (and therefore assessed through) read-intensive synthetic workloads with 100% cache miss rate. Yet certain real-life workloads exhibit worse slowdown than what is generated under said assumed worst-case, so we study the interference effects of both synthetic and real-life benchmarks on different multicore SoCs. Our experiments indicate that cache thrashing causes the worst interference experienced by real-life benchmarks – due to their different usage of caches – and that there is no universal worst-case workload for every platform.</div></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"167 \",\"pages\":\"Article 103487\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762125001596\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001596","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

商用现货（COTS）片上多核系统（SoC）为在各种应用领域部署复杂的工作负载提供了一种廉价而方便的解决方案。多个CPU核心和专用加速单元紧密共享内存和互连系统的组合可以提供巨大的峰值性能，但也威胁到由于内存干扰的时间可预测性。据报道，即使只关注主CPU内核，由于内存干扰导致的任务速度也会超过10倍。这种难以预测的时序行为阻碍了在时序关键应用领域更广泛地采用COTS多核soc，并激发了研究社区的广泛活动，以研究旨在减轻问题的解决方案。了解此类硬件平台上的最坏情况干扰模式是构建任何有效的内存干扰控制机制的基础。文献中的一个常见假设是，最坏情况下的干扰是由具有100%缓存丢失率的读取密集型合成工作负载产生的（因此通过该负载进行评估）。然而，某些实际工作负载表现出比假设的最坏情况下产生的速度更慢，因此我们研究了不同多核soc上合成基准和实际基准的干扰影响。我们的实验表明，缓存抖动会导致实际基准测试中遇到的最坏干扰——由于它们对缓存的使用不同——并且每个平台都没有通用的最坏工作负载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Taking a closer look at memory interference effects in commercial-off-the-shelf multicore SoCs

Commercial-off-the-shelf (COTS) multicore systems on chip (SoC) represent a cheap and convenient solution for deploying sophisticated workloads in various application domains. The combination of several CPU cores and dedicated acceleration units tightly sharing memory and interconnect systems can provide tremendous peak performance, but also threatens timing predictability due to memory interference. Even when focusing on main CPU cores only, it has been reported that task slowdown due to memory interference can surpass 10

\times

. Such poorly predictable timing behaviors bar greater adoption of COTS multicore SoCs in the domain of timing-critical applications, and motivate the wide activity of the research community to study solutions aimed at mitigating the problem. Understanding worst-case interference patterns on such hardware platforms is fundamental for building any effective memory interference control mechanism. A common assumption in the literature is that worst-case interference is generated by (and therefore assessed through) read-intensive synthetic workloads with 100% cache miss rate. Yet certain real-life workloads exhibit worse slowdown than what is generated under said assumed worst-case, so we study the interference effects of both synthetic and real-life benchmarks on different multicore SoCs. Our experiments indicate that cache thrashing causes the worst interference experienced by real-life benchmarks – due to their different usage of caches – and that there is no universal worst-case workload for every platform.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.