{"title":"Automated Traces-based Anomaly Detection and Root Cause Analysis in Cloud Platforms","authors":"Mbarka Soualhia, F. Wuhib","doi":"10.1109/IC2E55432.2022.00034","DOIUrl":null,"url":null,"abstract":"Current cloud infrastructures and their applications are increasingly complex, with confounding relationships among application elements and cloud infrastructure components. This makes timely identification of the root causes for faults that occur in such systems an important-yet-challenging task. In this paper, we propose a solution that automatically builds a correlation model and an anomaly detection model using kernel traces of cloud servers. The correlation model is used to capture the dependencies between the various elements of the cloud system while the anomaly detection model is used to identify anomalies related to specific elements of the system. Upon detection of a fault, our framework computes a dependency graph of detected anomalies using the models, which in turn is used to perform the root cause analysis. Evaluation results of our proposed framework on a Kubernetes cloud show that it can effectively find root causes of injected faults with an accuracy rate between 80% and 99.3%, with a low false negative rate.","PeriodicalId":415781,"journal":{"name":"2022 IEEE International Conference on Cloud Engineering (IC2E)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cloud Engineering (IC2E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E55432.2022.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Current cloud infrastructures and their applications are increasingly complex, with confounding relationships among application elements and cloud infrastructure components. This makes timely identification of the root causes for faults that occur in such systems an important-yet-challenging task. In this paper, we propose a solution that automatically builds a correlation model and an anomaly detection model using kernel traces of cloud servers. The correlation model is used to capture the dependencies between the various elements of the cloud system while the anomaly detection model is used to identify anomalies related to specific elements of the system. Upon detection of a fault, our framework computes a dependency graph of detected anomalies using the models, which in turn is used to perform the root cause analysis. Evaluation results of our proposed framework on a Kubernetes cloud show that it can effectively find root causes of injected faults with an accuracy rate between 80% and 99.3%, with a low false negative rate.