计划B:网络物理系统抗定时故障的设计方法

M. Khayatian, Mohammadreza Mehrabian, E. Andert, Reese Grimsley, Kyle Liang, Yifan Hu, Ian M. McCormack, Carlee Joe-Wong, Jonathan Aldrich, Bob Iannucci, Aviral Shrivastava
{"title":"计划B:网络物理系统抗定时故障的设计方法","authors":"M. Khayatian, Mohammadreza Mehrabian, E. Andert, Reese Grimsley, Kyle Liang, Yifan Hu, Ian M. McCormack, Carlee Joe-Wong, Jonathan Aldrich, Bob Iannucci, Aviral Shrivastava","doi":"10.1145/3516449","DOIUrl":null,"url":null,"abstract":"Many Cyber-Physical Systems (CPS) have timing constraints that must be met by the cyber components (software and the network) to ensure safety. It is a tedious job to check if a CPS meets its timing requirement especially when it is distributed and the software and/or the underlying computing platforms are complex. Furthermore, the system design is brittle since a timing failure can still happen (e.g., network failure, soft error bit flip). In this article, we propose a new design methodology called Plan B where timing constraints of the CPS are monitored at runtime, and a proper backup routine is executed when a timing failure happens to ensure safety. We provide a model on how to express the desired timing behavior using a set of timing constructs in a C/C++ code and how to efficiently monitor them at the runtime. We showcase the effectiveness of our approach by conducting experiments on three case studies: (1) the full software stack for autonomous driving (Apollo), (2) a multi-agent system with 1/10th-scale model robots, and (3) a quadrotor for search and rescue application. We show that the system remains safe and stable even when intentional faults are injected to cause a timing failure. We also demonstrate that the system can achieve graceful degradation when a less extreme timing failure happens.","PeriodicalId":380257,"journal":{"name":"ACM Transactions on Cyber-Physical Systems (TCPS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Plan B: Design Methodology for Cyber-Physical Systems Robust to Timing Failures\",\"authors\":\"M. Khayatian, Mohammadreza Mehrabian, E. Andert, Reese Grimsley, Kyle Liang, Yifan Hu, Ian M. McCormack, Carlee Joe-Wong, Jonathan Aldrich, Bob Iannucci, Aviral Shrivastava\",\"doi\":\"10.1145/3516449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many Cyber-Physical Systems (CPS) have timing constraints that must be met by the cyber components (software and the network) to ensure safety. It is a tedious job to check if a CPS meets its timing requirement especially when it is distributed and the software and/or the underlying computing platforms are complex. Furthermore, the system design is brittle since a timing failure can still happen (e.g., network failure, soft error bit flip). In this article, we propose a new design methodology called Plan B where timing constraints of the CPS are monitored at runtime, and a proper backup routine is executed when a timing failure happens to ensure safety. We provide a model on how to express the desired timing behavior using a set of timing constructs in a C/C++ code and how to efficiently monitor them at the runtime. We showcase the effectiveness of our approach by conducting experiments on three case studies: (1) the full software stack for autonomous driving (Apollo), (2) a multi-agent system with 1/10th-scale model robots, and (3) a quadrotor for search and rescue application. We show that the system remains safe and stable even when intentional faults are injected to cause a timing failure. We also demonstrate that the system can achieve graceful degradation when a less extreme timing failure happens.\",\"PeriodicalId\":380257,\"journal\":{\"name\":\"ACM Transactions on Cyber-Physical Systems (TCPS)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Cyber-Physical Systems (TCPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3516449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Cyber-Physical Systems (TCPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3516449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

许多网络物理系统(CPS)都有时间限制,网络组件(软件和网络)必须满足这些限制以确保安全。检查CPS是否满足其时间要求是一项繁琐的工作,特别是当它是分布式的,软件和/或底层计算平台很复杂时。此外,系统设计是脆弱的,因为定时故障仍然可能发生(例如,网络故障,软错误位翻转)。在本文中,我们提出了一种称为Plan B的新设计方法,其中在运行时监视CPS的定时约束,并在发生定时故障时执行适当的备份例程以确保安全。我们提供了一个模型,说明如何在C/ c++代码中使用一组计时结构来表达期望的计时行为,以及如何在运行时有效地监视它们。我们通过对三个案例研究进行实验来展示我们方法的有效性:(1)用于自动驾驶的完整软件堆栈(阿波罗),(2)具有1/10比例模型机器人的多智能体系统,以及(3)用于搜索和救援应用的四旋翼飞行器。我们表明,即使故意注入故障导致定时故障,系统仍保持安全稳定。我们还证明,当不太极端的定时故障发生时,系统可以实现优雅的降级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Plan B: Design Methodology for Cyber-Physical Systems Robust to Timing Failures
Many Cyber-Physical Systems (CPS) have timing constraints that must be met by the cyber components (software and the network) to ensure safety. It is a tedious job to check if a CPS meets its timing requirement especially when it is distributed and the software and/or the underlying computing platforms are complex. Furthermore, the system design is brittle since a timing failure can still happen (e.g., network failure, soft error bit flip). In this article, we propose a new design methodology called Plan B where timing constraints of the CPS are monitored at runtime, and a proper backup routine is executed when a timing failure happens to ensure safety. We provide a model on how to express the desired timing behavior using a set of timing constructs in a C/C++ code and how to efficiently monitor them at the runtime. We showcase the effectiveness of our approach by conducting experiments on three case studies: (1) the full software stack for autonomous driving (Apollo), (2) a multi-agent system with 1/10th-scale model robots, and (3) a quadrotor for search and rescue application. We show that the system remains safe and stable even when intentional faults are injected to cause a timing failure. We also demonstrate that the system can achieve graceful degradation when a less extreme timing failure happens.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信