Towards Proactive Fault Management of Enterprise Systems

R. Jia, S. Abdelwahed, A. Erradi
{"title":"Towards Proactive Fault Management of Enterprise Systems","authors":"R. Jia, S. Abdelwahed, A. Erradi","doi":"10.1109/ICCAC.2015.18","DOIUrl":null,"url":null,"abstract":"This paper introduces a model-based approach for autonomic fault management of computing systems. The proposed approach can recover a system from common faults while minimizing the impact on the system's quality of service and reducing potential revenue loss. When faults occur, the approach identifies fault types and accordingly compute the optimal recovery action with minimum impact on performance and operating cost using a predictive control algorithm. The paper introduces the formal settings of the model-based fault management approach and the underlying predictive control algorithm. The fault management approach has been verified on a testbed with respect to simulated faults including memory leak and network congestion. Simulation results show that our approach enabled effective automatic recovery from these faults with minimum impacts of system performance.","PeriodicalId":133491,"journal":{"name":"2015 International Conference on Cloud and Autonomic Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Cloud and Autonomic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAC.2015.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper introduces a model-based approach for autonomic fault management of computing systems. The proposed approach can recover a system from common faults while minimizing the impact on the system's quality of service and reducing potential revenue loss. When faults occur, the approach identifies fault types and accordingly compute the optimal recovery action with minimum impact on performance and operating cost using a predictive control algorithm. The paper introduces the formal settings of the model-based fault management approach and the underlying predictive control algorithm. The fault management approach has been verified on a testbed with respect to simulated faults including memory leak and network congestion. Simulation results show that our approach enabled effective automatic recovery from these faults with minimum impacts of system performance.
面向企业系统主动故障管理
介绍了一种基于模型的计算系统自主故障管理方法。所提出的方法可以从常见故障中恢复系统,同时最大限度地减少对系统服务质量的影响并减少潜在的收入损失。当故障发生时,该方法通过预测控制算法识别故障类型,计算出对性能和运行成本影响最小的最优恢复动作。介绍了基于模型的故障管理方法的形式化设置和底层的预测控制算法。针对内存泄漏和网络拥塞等模拟故障,对故障管理方法进行了验证。仿真结果表明,该方法能够在对系统性能影响最小的情况下,有效地实现故障的自动恢复。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信