{"title":"PaLLOC: Pairwise-based low-latency online coordinated resource manager of last-level cache and memory bandwidth on multicore systems","authors":"Yang Bai, Yizhi Huang, Si Chen, Renfa Li","doi":"10.1016/j.sysarc.2025.103427","DOIUrl":null,"url":null,"abstract":"<div><div>Modern advanced multicore CPUs integrate large last-level caches (LLC) and provide high memory bandwidth, which are generally shared among cores. In many scenarios, isolated resources are required among co-running applications with dynamic changes. This drives the need for online partitioning of these shared hardware resources to accommodate applications’ different and varying resource demands. However, dynamically managing LLC and memory bandwidth without prior knowledge faces numerous searches for resource configurations to gather sufficient information and find the partition solution, which may cause long management latency and limit system performance. To address this problem, we first identify several workload-independent observations and insights through a comprehensive exploration of the configuration space across various benchmarks, which can help reduce the need for configuration searches greatly. Guided by these findings, we propose a method that integrates two-step allocation with pairwise search techniques to maximize system instructions per cycle (IPC) throughput. Building on this method, we design and implement PaLLOC, a novel low-latency online coordinated resource manager of LLC and memory bandwidth on multicore systems. Comprehensive evaluations on an Intel commodity server demonstrate that PaLLOC consistently exhibits significant performance advantage across various system workloads with diverse resource requirements, achieving 1.14x-1.47x speedup in system IPC throughput over the state-of-the-art online partitioning method, with a management latency of approximately 300ms under a monitoring period of 10 ms.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103427"},"PeriodicalIF":4.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000992","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Modern advanced multicore CPUs integrate large last-level caches (LLC) and provide high memory bandwidth, which are generally shared among cores. In many scenarios, isolated resources are required among co-running applications with dynamic changes. This drives the need for online partitioning of these shared hardware resources to accommodate applications’ different and varying resource demands. However, dynamically managing LLC and memory bandwidth without prior knowledge faces numerous searches for resource configurations to gather sufficient information and find the partition solution, which may cause long management latency and limit system performance. To address this problem, we first identify several workload-independent observations and insights through a comprehensive exploration of the configuration space across various benchmarks, which can help reduce the need for configuration searches greatly. Guided by these findings, we propose a method that integrates two-step allocation with pairwise search techniques to maximize system instructions per cycle (IPC) throughput. Building on this method, we design and implement PaLLOC, a novel low-latency online coordinated resource manager of LLC and memory bandwidth on multicore systems. Comprehensive evaluations on an Intel commodity server demonstrate that PaLLOC consistently exhibits significant performance advantage across various system workloads with diverse resource requirements, achieving 1.14x-1.47x speedup in system IPC throughput over the state-of-the-art online partitioning method, with a management latency of approximately 300ms under a monitoring period of 10 ms.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.