Root Cause Analysis of Concurrent Alarms Based on Random Walk over Anomaly Propagation Graph

Lingyu Zhang, Jiabao Zhao, Min Zhang
{"title":"Root Cause Analysis of Concurrent Alarms Based on Random Walk over Anomaly Propagation Graph","authors":"Lingyu Zhang, Jiabao Zhao, Min Zhang","doi":"10.1109/ICNSC48988.2020.9238084","DOIUrl":null,"url":null,"abstract":"With the development of Internet technology, IT systems are getting more and more complex, in which there are two main relationships among system components: service call relationship and deployment configuration relationship. Once a local anomaly occurs in the system, it tends to spread, triggering emergent and dense concurrent alarms. Hence, it is important to quickly and precisely locate the root cause of concurrent alarms. In this paper, we first construct an anomaly propagation graph using collected system data. Then, based on the graph, we propose two optional algorithms: random walk and state iteration, to track anomaly propagation process and locate the root cause. Simulation experiments demonstrate that our proposed method can localize root causes correctly and rapidly for scenarios with complex call chains and resource competition, and is robust to alarm error. The proposed method pays more attention to system characteristics and depends little on experience knowledge of IT operators.","PeriodicalId":412290,"journal":{"name":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"257 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC48988.2020.9238084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the development of Internet technology, IT systems are getting more and more complex, in which there are two main relationships among system components: service call relationship and deployment configuration relationship. Once a local anomaly occurs in the system, it tends to spread, triggering emergent and dense concurrent alarms. Hence, it is important to quickly and precisely locate the root cause of concurrent alarms. In this paper, we first construct an anomaly propagation graph using collected system data. Then, based on the graph, we propose two optional algorithms: random walk and state iteration, to track anomaly propagation process and locate the root cause. Simulation experiments demonstrate that our proposed method can localize root causes correctly and rapidly for scenarios with complex call chains and resource competition, and is robust to alarm error. The proposed method pays more attention to system characteristics and depends little on experience knowledge of IT operators.
基于异常传播图随机游走的并发报警根本原因分析
随着Internet技术的发展,IT系统变得越来越复杂,其中系统组件之间的关系主要有两种:服务调用关系和部署配置关系。系统一旦出现局部异常,就有扩散的趋势,引发紧急、密集的并发告警。因此,快速准确地定位并发告警的根本原因非常重要。本文首先利用收集到的系统数据构造异常传播图。在此基础上,我们提出了随机漫步和状态迭代两种可选算法来跟踪异常传播过程并定位根本原因。仿真实验表明,该方法可以在复杂调用链和资源竞争场景下正确快速地定位根本原因,并且对报警误差具有鲁棒性。该方法更注重系统特性,对IT操作员的经验知识依赖较少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信