MemPol: polling-based microsecond-scale per-core memory bandwidth regulation

IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS
Alexander Zuepke, Andrea Bastoni, Weifan Chen, Marco Caccamo, Renato Mancuso
{"title":"MemPol: polling-based microsecond-scale per-core memory bandwidth regulation","authors":"Alexander Zuepke, Andrea Bastoni, Weifan Chen, Marco Caccamo, Renato Mancuso","doi":"10.1007/s11241-024-09422-8","DOIUrl":null,"url":null,"abstract":"<p>In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from <i>outside the cores</i> that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.</p>","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11241-024-09422-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from outside the cores that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.

Abstract Image

MemPol:基于轮询的微秒级每核内存带宽调节
在当今的多处理器片上系统中,共享内存子系统是众所周知的时间干扰源。这个问题会导致逻辑上独立的内核相互影响性能,从而导致悲观的最坏情况执行时间分析。通过节流进行内存调节是缓解干扰的最实用技术之一。传统的调节方案依赖于定时器和性能计数器中断的组合,在运行实时工作负载的相同内核上进行传递和处理。遗憾的是,为了防止过多的开销,调节只能以毫秒级的粒度执行。在这项工作中,我们从内核外部提出了一种新颖的调节机制,它能以微秒级监控主内存中应用内核活动的性能计数器。这种方法对内核上的应用完全透明,可利用广泛可用的片上调试设施来实现。所提出的机制还允许更复杂的指标组合,以实施负载感知调节。例如,它允许在内核之间重新分配未使用的带宽,同时将所有内核的总体内存带宽保持在给定阈值以下。我们在大量嵌入式平台上实施了我们的方法,并使用圣地亚哥视觉基准套件在赛灵思 Zynq UltraScale+ ZCU102、恩智浦 i.MX8M 和恩智浦 S32G2 平台上进行了深入评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Real-Time Systems
Real-Time Systems 工程技术-计算机:理论方法
CiteScore
2.90
自引率
7.70%
发文量
15
审稿时长
6 months
期刊介绍: Papers published in Real-Time Systems cover, among others, the following topics: requirements engineering, specification and verification techniques, design methods and tools, programming languages, operating systems, scheduling algorithms, architecture, hardware and interfacing, dependability and safety, distributed and other novel architectures, wired and wireless communications, wireless sensor systems, distributed databases, artificial intelligence techniques, expert systems, and application case studies. Applications are found in command and control systems, process control, automated manufacturing, flight control, avionics, space avionics and defense systems, shipborne systems, vision and robotics, pervasive and ubiquitous computing, and in an abundance of embedded systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信