基于日志异常检测的微服务系统根本原因度量定位

Lingzhi Wang, Nengwen Zhao, Junjie Chen, Pinnong Li, Wenchi Zhang, Kaixin Sui
{"title":"基于日志异常检测的微服务系统根本原因度量定位","authors":"Lingzhi Wang, Nengwen Zhao, Junjie Chen, Pinnong Li, Wenchi Zhang, Kaixin Sui","doi":"10.1109/ICWS49710.2020.00026","DOIUrl":null,"url":null,"abstract":"Microservice systems are typically fragile and failures are inevitable in them due to their complexity and large scale. However, it is challenging to localize the root-cause metric due to its complicated dependencies and the huge number of various metrics. Existing methods are based on either correlation between metrics or correlation between metrics and failures. All of them ignore the key data source in microservice, i.e., logs. In this paper, we propose a novel root-cause metric localization approach by incorporating log anomaly detection. Our approach is based on a key observation, the value of root-cause metric should be changed along with the change of the log anomaly score of the system caused by the failure. Specifically, our approach includes two components, collecting anomaly scores by log anomaly detection algorithm and identifying root-cause metric by robust correlation analysis with data augmentation. Experiments on an open-source benchmark microservice system have demonstrated our approach can identify root-cause metrics more accurately than existing methods and only require a short localization time. Therefore, our approach can assist engineers to save much effort in diagnosing and mitigating failures as soon as possible.","PeriodicalId":338833,"journal":{"name":"2020 IEEE International Conference on Web Services (ICWS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Root-Cause Metric Location for Microservice Systems via Log Anomaly Detection\",\"authors\":\"Lingzhi Wang, Nengwen Zhao, Junjie Chen, Pinnong Li, Wenchi Zhang, Kaixin Sui\",\"doi\":\"10.1109/ICWS49710.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microservice systems are typically fragile and failures are inevitable in them due to their complexity and large scale. However, it is challenging to localize the root-cause metric due to its complicated dependencies and the huge number of various metrics. Existing methods are based on either correlation between metrics or correlation between metrics and failures. All of them ignore the key data source in microservice, i.e., logs. In this paper, we propose a novel root-cause metric localization approach by incorporating log anomaly detection. Our approach is based on a key observation, the value of root-cause metric should be changed along with the change of the log anomaly score of the system caused by the failure. Specifically, our approach includes two components, collecting anomaly scores by log anomaly detection algorithm and identifying root-cause metric by robust correlation analysis with data augmentation. Experiments on an open-source benchmark microservice system have demonstrated our approach can identify root-cause metrics more accurately than existing methods and only require a short localization time. Therefore, our approach can assist engineers to save much effort in diagnosing and mitigating failures as soon as possible.\",\"PeriodicalId\":338833,\"journal\":{\"name\":\"2020 IEEE International Conference on Web Services (ICWS)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Web Services (ICWS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWS49710.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Web Services (ICWS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWS49710.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

摘要

微服务系统通常是脆弱的,由于其复杂性和大规模,故障是不可避免的。然而,由于其复杂的依赖关系和大量的各种度量,定位根本原因度量是具有挑战性的。现有的方法要么基于度量之间的相关性,要么基于度量与故障之间的相关性。它们都忽略了微服务中的关键数据源,即日志。在本文中,我们提出了一种结合日志异常检测的新的根本原因度量定位方法。我们的方法是基于一个关键的观察,根因度量的值应该随着故障引起的系统日志异常评分的变化而变化。具体来说,我们的方法包括两个部分,通过日志异常检测算法收集异常分数,通过数据增强的鲁棒相关分析识别根本原因度量。在开源基准微服务系统上的实验表明,我们的方法可以比现有方法更准确地识别根本原因指标,并且只需要很短的本地化时间。因此,我们的方法可以帮助工程师在诊断和减轻故障方面节省很多精力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Root-Cause Metric Location for Microservice Systems via Log Anomaly Detection
Microservice systems are typically fragile and failures are inevitable in them due to their complexity and large scale. However, it is challenging to localize the root-cause metric due to its complicated dependencies and the huge number of various metrics. Existing methods are based on either correlation between metrics or correlation between metrics and failures. All of them ignore the key data source in microservice, i.e., logs. In this paper, we propose a novel root-cause metric localization approach by incorporating log anomaly detection. Our approach is based on a key observation, the value of root-cause metric should be changed along with the change of the log anomaly score of the system caused by the failure. Specifically, our approach includes two components, collecting anomaly scores by log anomaly detection algorithm and identifying root-cause metric by robust correlation analysis with data augmentation. Experiments on an open-source benchmark microservice system have demonstrated our approach can identify root-cause metrics more accurately than existing methods and only require a short localization time. Therefore, our approach can assist engineers to save much effort in diagnosing and mitigating failures as soon as possible.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信