基于贝叶斯网络和在线优化的云应用运行错误诊断

Xiwei Xu, Liming Zhu, Daniel W. Sun, An Binh Tran, I. Weber, Min Fu, L. Bass
{"title":"基于贝叶斯网络和在线优化的云应用运行错误诊断","authors":"Xiwei Xu, Liming Zhu, Daniel W. Sun, An Binh Tran, I. Weber, Min Fu, L. Bass","doi":"10.1109/EDCC.2015.15","DOIUrl":null,"url":null,"abstract":"Operations such as upgrade or redeployment are an important cause of system outages. Diagnosing such errors at runtime poses significant challenges. In this paper, we propose an error diagnosis approach using Bayesian Networks. Each node in the network captures the potential (root) causes of operational errors and its probability under different operational contexts. Once an operational error is detected, our diagnosis algorithm chooses a starting node, traverses the Bayesian Network and performs assertion checking associated with each node to confirm the error, retrieve further information and update the belief network. The next node in the network to check is selected through an online optimisation that minimises the overall availability risk considering diagnosis time and fault consequence. Our experiments show that the technique minimises the risk of faults significantly compared to other approaches in most cases. The diagnosis accuracy is high but also depends on the transient nature of a fault.","PeriodicalId":138826,"journal":{"name":"2015 11th European Dependable Computing Conference (EDCC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Error Diagnosis of Cloud Application Operation Using Bayesian Networks and Online Optimisation\",\"authors\":\"Xiwei Xu, Liming Zhu, Daniel W. Sun, An Binh Tran, I. Weber, Min Fu, L. Bass\",\"doi\":\"10.1109/EDCC.2015.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Operations such as upgrade or redeployment are an important cause of system outages. Diagnosing such errors at runtime poses significant challenges. In this paper, we propose an error diagnosis approach using Bayesian Networks. Each node in the network captures the potential (root) causes of operational errors and its probability under different operational contexts. Once an operational error is detected, our diagnosis algorithm chooses a starting node, traverses the Bayesian Network and performs assertion checking associated with each node to confirm the error, retrieve further information and update the belief network. The next node in the network to check is selected through an online optimisation that minimises the overall availability risk considering diagnosis time and fault consequence. Our experiments show that the technique minimises the risk of faults significantly compared to other approaches in most cases. The diagnosis accuracy is high but also depends on the transient nature of a fault.\",\"PeriodicalId\":138826,\"journal\":{\"name\":\"2015 11th European Dependable Computing Conference (EDCC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 11th European Dependable Computing Conference (EDCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EDCC.2015.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 11th European Dependable Computing Conference (EDCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDCC.2015.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

升级、重新部署等操作是导致系统中断的重要原因。在运行时诊断此类错误带来了重大挑战。本文提出了一种基于贝叶斯网络的故障诊断方法。网络中的每个节点捕获操作错误的潜在(根本)原因及其在不同操作上下文下的概率。一旦检测到操作错误,我们的诊断算法选择一个起始节点,遍历贝叶斯网络并执行与每个节点相关的断言检查以确认错误,检索进一步的信息并更新信念网络。通过在线优化选择网络中的下一个要检查的节点,考虑到诊断时间和故障后果,将总体可用性风险降至最低。我们的实验表明,在大多数情况下,与其他方法相比,该技术显着降低了故障风险。诊断精度高,但也依赖于故障的暂态性质。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Error Diagnosis of Cloud Application Operation Using Bayesian Networks and Online Optimisation
Operations such as upgrade or redeployment are an important cause of system outages. Diagnosing such errors at runtime poses significant challenges. In this paper, we propose an error diagnosis approach using Bayesian Networks. Each node in the network captures the potential (root) causes of operational errors and its probability under different operational contexts. Once an operational error is detected, our diagnosis algorithm chooses a starting node, traverses the Bayesian Network and performs assertion checking associated with each node to confirm the error, retrieve further information and update the belief network. The next node in the network to check is selected through an online optimisation that minimises the overall availability risk considering diagnosis time and fault consequence. Our experiments show that the technique minimises the risk of faults significantly compared to other approaches in most cases. The diagnosis accuracy is high but also depends on the transient nature of a fault.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信