使用资源监控来选择恢复策略

R. Tirtea, Geert Deconinck, V. De Florio, R. Belmans
{"title":"使用资源监控来选择恢复策略","authors":"R. Tirtea, Geert Deconinck, V. De Florio, R. Belmans","doi":"10.1109/RAMS.2004.1285459","DOIUrl":null,"url":null,"abstract":"Distributed heterogeneous embedded systems involved in the control of infrastructures, such as electric power infrastructure, need to ensure reliable services regardless of faults and changes in the environment. A fault tolerance middleware architecture containing mechanisms for adaptation of quality-of-service (QoS) is developed to assure dependable control of the components of the infrastructure. Recovery strategies are used to allow reconfiguration of the system (e.g. graceful degradation) based on the circumstances of the failure. In this paper we present why and how available resources should be also considered together with the type of failure and the circumstances of the failure in the selection of recovery strategy. Changes in the environment such as lower resources at node levels (e.g. overload of the systems) or degradation of QoS (e.g. scarce of bandwidth in case of communication links) should be considered before allocating a new process/task to another host or before taking reconfiguration decisions. A mathematical model for generating a composite indicator based on sampled parameters is presented. The mechanism for monitoring resources at the node level is described and it is presented how this can be used in the selection of a recovery action (e.g. restart/migration of processes on/to overloaded nodes should be avoided). Also, based on the communication characteristics between distributed sites (e.g. depending availability or on cost), different recovery strategies can be selected. For this paper we consider the case of two recovery strategies and we present a mechanism for selecting the appropriate one. The fault-tolerant architecture integrating the QoS monitoring mechanism achieves dynamic reconfiguration of the recovery strategies based on the changes in the environment. Also, the QoS monitoring mechanism increases the differentiation between node crash and network problems for failure suspected nodes. Another advantage of using this mechanism is the dynamic adaptation of resource allocation for an overall increase in application availability.","PeriodicalId":270494,"journal":{"name":"Annual Symposium Reliability and Maintainability, 2004 - RAMS","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Using resource monitoring to select recovery strategies\",\"authors\":\"R. Tirtea, Geert Deconinck, V. De Florio, R. Belmans\",\"doi\":\"10.1109/RAMS.2004.1285459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed heterogeneous embedded systems involved in the control of infrastructures, such as electric power infrastructure, need to ensure reliable services regardless of faults and changes in the environment. A fault tolerance middleware architecture containing mechanisms for adaptation of quality-of-service (QoS) is developed to assure dependable control of the components of the infrastructure. Recovery strategies are used to allow reconfiguration of the system (e.g. graceful degradation) based on the circumstances of the failure. In this paper we present why and how available resources should be also considered together with the type of failure and the circumstances of the failure in the selection of recovery strategy. Changes in the environment such as lower resources at node levels (e.g. overload of the systems) or degradation of QoS (e.g. scarce of bandwidth in case of communication links) should be considered before allocating a new process/task to another host or before taking reconfiguration decisions. A mathematical model for generating a composite indicator based on sampled parameters is presented. The mechanism for monitoring resources at the node level is described and it is presented how this can be used in the selection of a recovery action (e.g. restart/migration of processes on/to overloaded nodes should be avoided). Also, based on the communication characteristics between distributed sites (e.g. depending availability or on cost), different recovery strategies can be selected. For this paper we consider the case of two recovery strategies and we present a mechanism for selecting the appropriate one. The fault-tolerant architecture integrating the QoS monitoring mechanism achieves dynamic reconfiguration of the recovery strategies based on the changes in the environment. Also, the QoS monitoring mechanism increases the differentiation between node crash and network problems for failure suspected nodes. Another advantage of using this mechanism is the dynamic adaptation of resource allocation for an overall increase in application availability.\",\"PeriodicalId\":270494,\"journal\":{\"name\":\"Annual Symposium Reliability and Maintainability, 2004 - RAMS\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Symposium Reliability and Maintainability, 2004 - RAMS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAMS.2004.1285459\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium Reliability and Maintainability, 2004 - RAMS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAMS.2004.1285459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

分布式异构嵌入式系统涉及对基础设施(如电力基础设施)的控制,需要在不受环境故障和变化影响的情况下确保可靠的服务。开发了一种容错中间件体系结构,其中包含服务质量(QoS)适应机制,以确保对基础结构组件的可靠控制。恢复策略用于允许基于故障情况对系统进行重新配置(例如,优雅降级)。在本文中,我们提出了为什么以及如何在选择恢复策略时还应考虑可用资源以及故障类型和故障情况。在将新进程/任务分配给另一台主机或在进行重新配置决策之前,应考虑环境的变化,例如节点级别的资源减少(例如系统过载)或QoS的退化(例如在通信链路的情况下带宽不足)。提出了一种基于采样参数生成复合指标的数学模型。描述了在节点级别监视资源的机制,并介绍了如何在选择恢复操作时使用该机制(例如,应避免在超载节点上/向超载节点上重新启动/迁移进程)。此外,基于分布式站点之间的通信特征(例如,取决于可用性或成本),可以选择不同的恢复策略。在本文中,我们考虑了两种恢复策略的情况,并提出了一种选择合适的恢复策略的机制。容错体系结构集成了QoS监控机制,实现了根据环境变化动态重新配置恢复策略。此外,QoS监控机制增加了节点崩溃和疑似故障节点的网络问题之间的区别。使用这种机制的另一个优点是可以动态地调整资源分配,从而全面提高应用程序的可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using resource monitoring to select recovery strategies
Distributed heterogeneous embedded systems involved in the control of infrastructures, such as electric power infrastructure, need to ensure reliable services regardless of faults and changes in the environment. A fault tolerance middleware architecture containing mechanisms for adaptation of quality-of-service (QoS) is developed to assure dependable control of the components of the infrastructure. Recovery strategies are used to allow reconfiguration of the system (e.g. graceful degradation) based on the circumstances of the failure. In this paper we present why and how available resources should be also considered together with the type of failure and the circumstances of the failure in the selection of recovery strategy. Changes in the environment such as lower resources at node levels (e.g. overload of the systems) or degradation of QoS (e.g. scarce of bandwidth in case of communication links) should be considered before allocating a new process/task to another host or before taking reconfiguration decisions. A mathematical model for generating a composite indicator based on sampled parameters is presented. The mechanism for monitoring resources at the node level is described and it is presented how this can be used in the selection of a recovery action (e.g. restart/migration of processes on/to overloaded nodes should be avoided). Also, based on the communication characteristics between distributed sites (e.g. depending availability or on cost), different recovery strategies can be selected. For this paper we consider the case of two recovery strategies and we present a mechanism for selecting the appropriate one. The fault-tolerant architecture integrating the QoS monitoring mechanism achieves dynamic reconfiguration of the recovery strategies based on the changes in the environment. Also, the QoS monitoring mechanism increases the differentiation between node crash and network problems for failure suspected nodes. Another advantage of using this mechanism is the dynamic adaptation of resource allocation for an overall increase in application availability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信