Work-in-Progress: Cloud Computing for Time-Triggered Safety-Critical Systems

2021 IEEE Real-Time Systems Symposium (RTSS) Pub Date : 2021-12-01 DOI:10.1109/rtss52674.2021.00054

Gautam Gala, Javier Castillo Rivera, G. Fohler

{"title":"Work-in-Progress: Cloud Computing for Time-Triggered Safety-Critical Systems","authors":"Gautam Gala, Javier Castillo Rivera, G. Fohler","doi":"10.1109/rtss52674.2021.00054","DOIUrl":null,"url":null,"abstract":"Safety-critical (SC) applications require high availability, possibility of run-time reconfiguration, and significant resource over-provisioning. Furthermore, they suffer from hardware obsolescence due to the use of custom or specialized hardware. Cloud computing could be used to resolve these issues. Moreover, they could improve SC systems suffering from scalability issues, e.g., the every growing SC railway network. However, SC applications require low latencies and guarantees that are currently not possible on clouds. In this paper, we explore the possibility of enhancing the current cloud computing paradigm by adding a resource management layer to support the deterministic execution of SC applications while providing the benefits of cloud computing principles. We provide a cloud-wide global resource manager that monitors, controls, and coordinates node-level Local Resource Managers (LRMs) placed on each private cloud node. In addition, we give guarantees to SC Virtual Machines (VMs) on each node via a novel CPU-and memory bandwidth-aware Time-triggered (TT) offline scheduling algorithm that generates a scheduling table for use by an LRM. For improving the utilization of the cloud resources, the LRMs provide flexibility to schedule Event-Triggered (ET) SC and non-critical VM at run-time without regenerating the offline scheduling table. We implemented our approach in a KVM-based private cloud and performed experiments to determine the relevant overheads.","PeriodicalId":102789,"journal":{"name":"2021 IEEE Real-Time Systems Symposium (RTSS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Real-Time Systems Symposium (RTSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/rtss52674.2021.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Safety-critical (SC) applications require high availability, possibility of run-time reconfiguration, and significant resource over-provisioning. Furthermore, they suffer from hardware obsolescence due to the use of custom or specialized hardware. Cloud computing could be used to resolve these issues. Moreover, they could improve SC systems suffering from scalability issues, e.g., the every growing SC railway network. However, SC applications require low latencies and guarantees that are currently not possible on clouds. In this paper, we explore the possibility of enhancing the current cloud computing paradigm by adding a resource management layer to support the deterministic execution of SC applications while providing the benefits of cloud computing principles. We provide a cloud-wide global resource manager that monitors, controls, and coordinates node-level Local Resource Managers (LRMs) placed on each private cloud node. In addition, we give guarantees to SC Virtual Machines (VMs) on each node via a novel CPU-and memory bandwidth-aware Time-triggered (TT) offline scheduling algorithm that generates a scheduling table for use by an LRM. For improving the utilization of the cloud resources, the LRMs provide flexibility to schedule Event-Triggered (ET) SC and non-critical VM at run-time without regenerating the offline scheduling table. We implemented our approach in a KVM-based private cloud and performed experiments to determine the relevant overheads.

查看原文本刊更多论文

正在进行的工作:用于时间触发安全关键系统的云计算

安全关键型(SC)应用程序需要高可用性、运行时重新配置的可能性以及大量的资源过度供应。此外，由于使用定制或专门的硬件，它们还会遭受硬件过时的困扰。云计算可以用来解决这些问题。此外，它们可以改善受可扩展性问题困扰的SC系统，例如，不断增长的SC铁路网。然而，SC应用程序需要低延迟和保证，这是目前在云上不可能实现的。在本文中，我们探讨了通过添加资源管理层来支持SC应用程序的确定性执行，同时提供云计算原理的好处，从而增强当前云计算范式的可能性。我们提供了一个云范围的全局资源管理器，用于监视、控制和协调放置在每个私有云节点上的节点级本地资源管理器(lrm)。此外，我们通过一种新颖的cpu和内存带宽感知时间触发(TT)离线调度算法为每个节点上的SC虚拟机(vm)提供保证，该算法生成供LRM使用的调度表。为了提高云资源的利用率，lrm提供了在运行时调度事件触发(ET) SC和非关键VM的灵活性，而无需重新生成脱机调度表。我们在基于kvm的私有云中实现了我们的方法，并进行了实验以确定相关的开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Real-Time Systems Symposium (RTSS)

自引率

0.00%

发文量