PaLLOC: Pairwise-based low-latency online coordinated resource manager of last-level cache and memory bandwidth on multicore systems

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-04-30 DOI:10.1016/j.sysarc.2025.103427

Yang Bai, Yizhi Huang, Si Chen, Renfa Li

{"title":"PaLLOC: Pairwise-based low-latency online coordinated resource manager of last-level cache and memory bandwidth on multicore systems","authors":"Yang Bai, Yizhi Huang, Si Chen, Renfa Li","doi":"10.1016/j.sysarc.2025.103427","DOIUrl":null,"url":null,"abstract":"<div><div>Modern advanced multicore CPUs integrate large last-level caches (LLC) and provide high memory bandwidth, which are generally shared among cores. In many scenarios, isolated resources are required among co-running applications with dynamic changes. This drives the need for online partitioning of these shared hardware resources to accommodate applications’ different and varying resource demands. However, dynamically managing LLC and memory bandwidth without prior knowledge faces numerous searches for resource configurations to gather sufficient information and find the partition solution, which may cause long management latency and limit system performance. To address this problem, we first identify several workload-independent observations and insights through a comprehensive exploration of the configuration space across various benchmarks, which can help reduce the need for configuration searches greatly. Guided by these findings, we propose a method that integrates two-step allocation with pairwise search techniques to maximize system instructions per cycle (IPC) throughput. Building on this method, we design and implement PaLLOC, a novel low-latency online coordinated resource manager of LLC and memory bandwidth on multicore systems. Comprehensive evaluations on an Intel commodity server demonstrate that PaLLOC consistently exhibits significant performance advantage across various system workloads with diverse resource requirements, achieving 1.14x-1.47x speedup in system IPC throughput over the state-of-the-art online partitioning method, with a management latency of approximately 300ms under a monitoring period of 10 ms.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103427"},"PeriodicalIF":4.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000992","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Modern advanced multicore CPUs integrate large last-level caches (LLC) and provide high memory bandwidth, which are generally shared among cores. In many scenarios, isolated resources are required among co-running applications with dynamic changes. This drives the need for online partitioning of these shared hardware resources to accommodate applications’ different and varying resource demands. However, dynamically managing LLC and memory bandwidth without prior knowledge faces numerous searches for resource configurations to gather sufficient information and find the partition solution, which may cause long management latency and limit system performance. To address this problem, we first identify several workload-independent observations and insights through a comprehensive exploration of the configuration space across various benchmarks, which can help reduce the need for configuration searches greatly. Guided by these findings, we propose a method that integrates two-step allocation with pairwise search techniques to maximize system instructions per cycle (IPC) throughput. Building on this method, we design and implement PaLLOC, a novel low-latency online coordinated resource manager of LLC and memory bandwidth on multicore systems. Comprehensive evaluations on an Intel commodity server demonstrate that PaLLOC consistently exhibits significant performance advantage across various system workloads with diverse resource requirements, achieving 1.14x-1.47x speedup in system IPC throughput over the state-of-the-art online partitioning method, with a management latency of approximately 300ms under a monitoring period of 10 ms.

查看原文本刊更多论文

PaLLOC：多核系统上基于成对的低延迟在线协调资源管理器，用于管理最后一级缓存和内存带宽

现代先进的多核cpu集成了大型最后一级缓存（LLC），并提供高内存带宽，通常在内核之间共享。在许多场景中，需要在具有动态更改的共同运行的应用程序之间使用隔离的资源。这就需要对这些共享硬件资源进行在线分区，以适应应用程序的不同和不断变化的资源需求。但是，在没有先验知识的情况下动态管理LLC和内存带宽时，需要对资源配置进行大量搜索，以收集足够的信息并找到分区解决方案，这可能会导致较长的管理延迟并限制系统性能。为了解决这个问题，我们首先通过对各种基准测试的配置空间的全面探索，确定几个与工作负载无关的观察结果和见解，这有助于大大减少对配置搜索的需求。在这些发现的指导下，我们提出了一种将两步分配与成对搜索技术相结合的方法，以最大化系统每周期指令（IPC）吞吐量。在此基础上，我们设计并实现了一种新颖的低延迟在线协调资源管理器PaLLOC，用于多核系统的LLC和内存带宽。对英特尔商用服务器的综合评估表明，PaLLOC在各种系统工作负载和各种资源需求上始终表现出显著的性能优势，与最先进的在线分区方法相比，系统IPC吞吐量的加速速度提高了1.14 -1.47倍，在10毫秒的监控周期下，管理延迟约为300毫秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.