Hongxia He, Xi Li, Peng Chen, Juan Chen, Ming Liu, Lei Wu
{"title":"高效定位云基础设施的系统异常:基于动态图变换器的新型并行框架","authors":"Hongxia He, Xi Li, Peng Chen, Juan Chen, Ming Liu, Lei Wu","doi":"10.1186/s13677-024-00677-x","DOIUrl":null,"url":null,"abstract":"Cloud environment is a virtual, online, and distributed computing environment that provides users with large-scale services. And cloud monitoring plays an integral role in protecting infrastructures in the cloud environment. Cloud monitoring systems need to closely monitor various KPIs of cloud resources, to accurately detect anomalies. However, due to the complexity and highly dynamic nature of the cloud environment, anomaly detection for these KPIs with various patterns and data quality is a huge challenge, especially those massive unlabeled data. Besides, it’s also difficult to improve the accuracy of the existing anomaly detection methods. To solve these problems, we propose a novel Dynamic Graph Transformer based Parallel Framework (DGT-PF) for efficiently detect system anomalies in cloud infrastructures, which utilizes Transformer with anomaly attention mechanism and Graph Neural Network (GNN) to learn the spatio-temporal features of KPIs to improve the accuracy and timeliness of model anomaly detection. Specifically, we propose an effective dynamic relationship embedding strategy to dynamically learn spatio-temporal features and adaptively generate adjacency matrices, and soft cluster each GNN layer through Diffpooling module. In addition, we also use nonlinear neural network model and AR-MLP model in parallel to obtain better detection accuracy and improve detection performance. The experiment shows that the DGT-PF framework have achieved the highest F1-Score on 5 public datasets, with an average improvement of 21.6% compared to 11 anomaly detection models.","PeriodicalId":501257,"journal":{"name":"Journal of Cloud Computing","volume":"70 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficiently localizing system anomalies for cloud infrastructures: a novel Dynamic Graph Transformer based Parallel Framework\",\"authors\":\"Hongxia He, Xi Li, Peng Chen, Juan Chen, Ming Liu, Lei Wu\",\"doi\":\"10.1186/s13677-024-00677-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud environment is a virtual, online, and distributed computing environment that provides users with large-scale services. And cloud monitoring plays an integral role in protecting infrastructures in the cloud environment. Cloud monitoring systems need to closely monitor various KPIs of cloud resources, to accurately detect anomalies. However, due to the complexity and highly dynamic nature of the cloud environment, anomaly detection for these KPIs with various patterns and data quality is a huge challenge, especially those massive unlabeled data. Besides, it’s also difficult to improve the accuracy of the existing anomaly detection methods. To solve these problems, we propose a novel Dynamic Graph Transformer based Parallel Framework (DGT-PF) for efficiently detect system anomalies in cloud infrastructures, which utilizes Transformer with anomaly attention mechanism and Graph Neural Network (GNN) to learn the spatio-temporal features of KPIs to improve the accuracy and timeliness of model anomaly detection. Specifically, we propose an effective dynamic relationship embedding strategy to dynamically learn spatio-temporal features and adaptively generate adjacency matrices, and soft cluster each GNN layer through Diffpooling module. In addition, we also use nonlinear neural network model and AR-MLP model in parallel to obtain better detection accuracy and improve detection performance. The experiment shows that the DGT-PF framework have achieved the highest F1-Score on 5 public datasets, with an average improvement of 21.6% compared to 11 anomaly detection models.\",\"PeriodicalId\":501257,\"journal\":{\"name\":\"Journal of Cloud Computing\",\"volume\":\"70 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13677-024-00677-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13677-024-00677-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficiently localizing system anomalies for cloud infrastructures: a novel Dynamic Graph Transformer based Parallel Framework
Cloud environment is a virtual, online, and distributed computing environment that provides users with large-scale services. And cloud monitoring plays an integral role in protecting infrastructures in the cloud environment. Cloud monitoring systems need to closely monitor various KPIs of cloud resources, to accurately detect anomalies. However, due to the complexity and highly dynamic nature of the cloud environment, anomaly detection for these KPIs with various patterns and data quality is a huge challenge, especially those massive unlabeled data. Besides, it’s also difficult to improve the accuracy of the existing anomaly detection methods. To solve these problems, we propose a novel Dynamic Graph Transformer based Parallel Framework (DGT-PF) for efficiently detect system anomalies in cloud infrastructures, which utilizes Transformer with anomaly attention mechanism and Graph Neural Network (GNN) to learn the spatio-temporal features of KPIs to improve the accuracy and timeliness of model anomaly detection. Specifically, we propose an effective dynamic relationship embedding strategy to dynamically learn spatio-temporal features and adaptively generate adjacency matrices, and soft cluster each GNN layer through Diffpooling module. In addition, we also use nonlinear neural network model and AR-MLP model in parallel to obtain better detection accuracy and improve detection performance. The experiment shows that the DGT-PF framework have achieved the highest F1-Score on 5 public datasets, with an average improvement of 21.6% compared to 11 anomaly detection models.