Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation

Akshay Dabholkar, A. Dubey, A. Gokhale, G. Karsai, N. Mahadevan
{"title":"Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation","authors":"Akshay Dabholkar, A. Dubey, A. Gokhale, G. Karsai, N. Mahadevan","doi":"10.1109/SRDS.2012.59","DOIUrl":null,"url":null,"abstract":"Distributed real-time and embedded (DRE) systems are a class of real-time systems formed through a composition of predominantly legacy, closed and statically scheduled real-time subsystems, which comprise over-provisioned resources to deal with worst-case failure scenarios. The formation of the system-of-systems leads to a new range of faults that manifest at different granularities for which no statically defined fault tolerance scheme applies. Thus, dynamic and adaptive fault tolerance mechanisms are needed which must execute within the available resources without compromising the safety and timeliness of existing real-time tasks in the individual subsystems. To address these requirements, this paper describes a middleware solution called Safe Middleware Adaptation for Real-Time Fault Tolerance (SafeMAT), which opportunistically leverages the available slack in the over-provisioned resources of individual subsystems. SafeMAT comprises three primary artifacts: (1) a flexible and configurable distributed, runtime resource monitoring framework that can pinpoint in real-time the available slack in the system that is used in making dynamic and adaptive fault tolerance decisions, (2) a safe and resource aware dynamic failure adaptation algorithm that enables efficient recovery from different granularities of failures within the available slack in the execution schedule while ensuring real-time constraints are not violated and resources are not overloaded, and (3) a framework that empirically validates the correctness of the dynamic mechanisms and the safety of the DRE system. Experimental results evaluating SafeMAT on an avionics application indicates that SafeMAT incurs only 9-15% runtime fail over and 2-6% processor utilization overheads thereby providing safe and predictable failure adaptability in real-time.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 31st Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2012.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Distributed real-time and embedded (DRE) systems are a class of real-time systems formed through a composition of predominantly legacy, closed and statically scheduled real-time subsystems, which comprise over-provisioned resources to deal with worst-case failure scenarios. The formation of the system-of-systems leads to a new range of faults that manifest at different granularities for which no statically defined fault tolerance scheme applies. Thus, dynamic and adaptive fault tolerance mechanisms are needed which must execute within the available resources without compromising the safety and timeliness of existing real-time tasks in the individual subsystems. To address these requirements, this paper describes a middleware solution called Safe Middleware Adaptation for Real-Time Fault Tolerance (SafeMAT), which opportunistically leverages the available slack in the over-provisioned resources of individual subsystems. SafeMAT comprises three primary artifacts: (1) a flexible and configurable distributed, runtime resource monitoring framework that can pinpoint in real-time the available slack in the system that is used in making dynamic and adaptive fault tolerance decisions, (2) a safe and resource aware dynamic failure adaptation algorithm that enables efficient recovery from different granularities of failures within the available slack in the execution schedule while ensuring real-time constraints are not violated and resources are not overloaded, and (3) a framework that empirically validates the correctness of the dynamic mechanisms and the safety of the DRE system. Experimental results evaluating SafeMAT on an avionics application indicates that SafeMAT incurs only 9-15% runtime fail over and 2-6% processor utilization overheads thereby providing safe and predictable failure adaptability in real-time.
基于安全中间件适配的可靠分布式实时嵌入式系统
分布式实时和嵌入式(DRE)系统是一类实时系统,主要由遗留的、封闭的和静态调度的实时子系统组成,这些子系统包括过度供应的资源来处理最坏的故障场景。系统的系统的形成导致了一系列新的故障,这些故障在不同的粒度上表现出来,而静态定义的容错方案不适用于这些故障。因此,需要动态和自适应容错机制,该机制必须在可用资源内执行,而不会损害各个子系统中现有实时任务的安全性和及时性。为了满足这些需求,本文描述了一种称为安全中间件适应实时容错(SafeMAT)的中间件解决方案,该解决方案利用了单个子系统过度供应的资源中的可用空闲。SafeMAT包括三个主要构件:(1)一个灵活可配置的分布式运行时资源监控框架,可以实时精确定位系统中可用的空闲时间,用于动态和自适应容错决策;(2)一个安全和资源感知的动态故障自适应算法,可以在执行计划中的可用空闲时间内从不同粒度的故障中高效恢复,同时确保不违反实时约束和资源不过载;(3)实证验证DRE系统动态机制正确性和安全性的框架。在航空电子应用中评估SafeMAT的实验结果表明,SafeMAT仅产生9-15%的运行时故障转移和2-6%的处理器使用开销,从而提供安全和可预测的实时故障适应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信