System management recovery protocol for MPSoCs

Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes
{"title":"System management recovery protocol for MPSoCs","authors":"Vinicius Fochi, L. L. Caimi, Marcelo Ruaro, E. Wächter, F. Moraes","doi":"10.1109/SOCC.2017.8226080","DOIUrl":null,"url":null,"abstract":"The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 30th IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC.2017.8226080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The advances in silicon technology lead to systems with hundreds of processors, the NoC-based MPSoCs. However, the higher fault probability in deep sub-micron technologies shortens the integrated circuits lifetime. Operating systems enable to execute distributed applications in the MPSoC processing elements (PEs). Large systems require PEs dedicated to management purposes, for example, execute the task mapping, handle monitoring data, and run self-awareness adaptation. This paper addresses an MPSoC hierarchically organized: PEs with an embedded operating system executing the applications (SpE) and dedicated PEs manage at runtime the system resources (Mpe). A rich literature presents fault-tolerant proposals for the hardware and software components of the MPSoC, but there is a significant gap related to fault-tolerant approaches at the system level, i.e., related to the PEs with the function to manage the system. Consider for example an Mpe responsible for managing a set of SpE s. A fault in an Mpe prevents the access to the set of SpE s to execute new applications. The goal of this paper is to present a method to determine when an Mpe became faulty, and propose a protocol to migrate the management software safely to an Spe. The management data is preserved, without saving the context in redundant structures. The proposal is transparent to the applications executing in the system, with a small execution overhead observed during the management migration, presented in the results Section.
用于mpsoc的系统管理恢复协议
硅技术的进步导致了拥有数百个处理器的系统,即基于noc的mpsoc。然而,在深亚微米技术中,较高的故障概率缩短了集成电路的寿命。操作系统能够在MPSoC处理元素(pe)中执行分布式应用程序。大型系统需要专用于管理目的的pe,例如,执行任务映射、处理监视数据和运行自我感知适应。本文讨论了一种分层组织的MPSoC: pe具有执行应用程序的嵌入式操作系统(SpE),专用pe在运行时管理系统资源(Mpe)。丰富的文献提出了MPSoC硬件和软件组件的容错建议,但在系统级别上存在与容错方法相关的显着差距,即与具有管理系统功能的pe相关。例如,考虑一个负责管理一组SpE的Mpe。Mpe的故障会阻止访问这组SpE来执行新的应用程序。本文的目标是提出一种确定Mpe何时出现故障的方法,并提出一种将管理软件安全地迁移到Spe的协议。管理数据被保留,而不需要在冗余结构中保存上下文。该建议对于在系统中执行的应用程序是透明的,在管理迁移期间观察到的执行开销很小,在结果部分中给出。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信