{"title":"基于日志异常检测的微服务系统根本原因度量定位","authors":"Lingzhi Wang, Nengwen Zhao, Junjie Chen, Pinnong Li, Wenchi Zhang, Kaixin Sui","doi":"10.1109/ICWS49710.2020.00026","DOIUrl":null,"url":null,"abstract":"Microservice systems are typically fragile and failures are inevitable in them due to their complexity and large scale. However, it is challenging to localize the root-cause metric due to its complicated dependencies and the huge number of various metrics. Existing methods are based on either correlation between metrics or correlation between metrics and failures. All of them ignore the key data source in microservice, i.e., logs. In this paper, we propose a novel root-cause metric localization approach by incorporating log anomaly detection. Our approach is based on a key observation, the value of root-cause metric should be changed along with the change of the log anomaly score of the system caused by the failure. Specifically, our approach includes two components, collecting anomaly scores by log anomaly detection algorithm and identifying root-cause metric by robust correlation analysis with data augmentation. Experiments on an open-source benchmark microservice system have demonstrated our approach can identify root-cause metrics more accurately than existing methods and only require a short localization time. Therefore, our approach can assist engineers to save much effort in diagnosing and mitigating failures as soon as possible.","PeriodicalId":338833,"journal":{"name":"2020 IEEE International Conference on Web Services (ICWS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Root-Cause Metric Location for Microservice Systems via Log Anomaly Detection\",\"authors\":\"Lingzhi Wang, Nengwen Zhao, Junjie Chen, Pinnong Li, Wenchi Zhang, Kaixin Sui\",\"doi\":\"10.1109/ICWS49710.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microservice systems are typically fragile and failures are inevitable in them due to their complexity and large scale. However, it is challenging to localize the root-cause metric due to its complicated dependencies and the huge number of various metrics. Existing methods are based on either correlation between metrics or correlation between metrics and failures. All of them ignore the key data source in microservice, i.e., logs. In this paper, we propose a novel root-cause metric localization approach by incorporating log anomaly detection. Our approach is based on a key observation, the value of root-cause metric should be changed along with the change of the log anomaly score of the system caused by the failure. Specifically, our approach includes two components, collecting anomaly scores by log anomaly detection algorithm and identifying root-cause metric by robust correlation analysis with data augmentation. Experiments on an open-source benchmark microservice system have demonstrated our approach can identify root-cause metrics more accurately than existing methods and only require a short localization time. Therefore, our approach can assist engineers to save much effort in diagnosing and mitigating failures as soon as possible.\",\"PeriodicalId\":338833,\"journal\":{\"name\":\"2020 IEEE International Conference on Web Services (ICWS)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Web Services (ICWS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWS49710.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Web Services (ICWS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWS49710.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Root-Cause Metric Location for Microservice Systems via Log Anomaly Detection
Microservice systems are typically fragile and failures are inevitable in them due to their complexity and large scale. However, it is challenging to localize the root-cause metric due to its complicated dependencies and the huge number of various metrics. Existing methods are based on either correlation between metrics or correlation between metrics and failures. All of them ignore the key data source in microservice, i.e., logs. In this paper, we propose a novel root-cause metric localization approach by incorporating log anomaly detection. Our approach is based on a key observation, the value of root-cause metric should be changed along with the change of the log anomaly score of the system caused by the failure. Specifically, our approach includes two components, collecting anomaly scores by log anomaly detection algorithm and identifying root-cause metric by robust correlation analysis with data augmentation. Experiments on an open-source benchmark microservice system have demonstrated our approach can identify root-cause metrics more accurately than existing methods and only require a short localization time. Therefore, our approach can assist engineers to save much effort in diagnosing and mitigating failures as soon as possible.