MemPol: polling-based microsecond-scale per-core memory bandwidth regulation

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems Pub Date : 2024-06-17 DOI:10.1007/s11241-024-09422-8

Alexander Zuepke, Andrea Bastoni, Weifan Chen, Marco Caccamo, Renato Mancuso

{"title":"MemPol: polling-based microsecond-scale per-core memory bandwidth regulation","authors":"Alexander Zuepke, Andrea Bastoni, Weifan Chen, Marco Caccamo, Renato Mancuso","doi":"10.1007/s11241-024-09422-8","DOIUrl":null,"url":null,"abstract":"<p>In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from <i>outside the cores</i> that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.</p>","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"48 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11241-024-09422-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from outside the cores that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.

Abstract Image

查看原文本刊更多论文

MemPol：基于轮询的微秒级每核内存带宽调节

在当今的多处理器片上系统中，共享内存子系统是众所周知的时间干扰源。这个问题会导致逻辑上独立的内核相互影响性能，从而导致悲观的最坏情况执行时间分析。通过节流进行内存调节是缓解干扰的最实用技术之一。传统的调节方案依赖于定时器和性能计数器中断的组合，在运行实时工作负载的相同内核上进行传递和处理。遗憾的是，为了防止过多的开销，调节只能以毫秒级的粒度执行。在这项工作中，我们从内核外部提出了一种新颖的调节机制，它能以微秒级监控主内存中应用内核活动的性能计数器。这种方法对内核上的应用完全透明，可利用广泛可用的片上调试设施来实现。所提出的机制还允许更复杂的指标组合，以实施负载感知调节。例如，它允许在内核之间重新分配未使用的带宽，同时将所有内核的总体内存带宽保持在给定阈值以下。我们在大量嵌入式平台上实施了我们的方法，并使用圣地亚哥视觉基准套件在赛灵思 Zynq UltraScale+ ZCU102、恩智浦 i.MX8M 和恩智浦 S32G2 平台上进行了深入评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Real-Time Systems 工程技术-计算机：理论方法

CiteScore

2.90

自引率

7.70%

发文量

审稿时长

6 months

期刊介绍： Papers published in Real-Time Systems cover, among others, the following topics: requirements engineering, specification and verification techniques, design methods and tools, programming languages, operating systems, scheduling algorithms, architecture, hardware and interfacing, dependability and safety, distributed and other novel architectures, wired and wireless communications, wireless sensor systems, distributed databases, artificial intelligence techniques, expert systems, and application case studies. Applications are found in command and control systems, process control, automated manufacturing, flight control, avionics, space avionics and defense systems, shipborne systems, vision and robotics, pervasive and ubiquitous computing, and in an abundance of embedded systems.