{"title":"计算云系统自主管理的异常检测框架","authors":"Derek Smith, Qiang Guan, Song Fu","doi":"10.1109/COMPSACW.2010.72","DOIUrl":null,"url":null,"abstract":"In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.","PeriodicalId":121135,"journal":{"name":"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems\",\"authors\":\"Derek Smith, Qiang Guan, Song Fu\",\"doi\":\"10.1109/COMPSACW.2010.72\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.\",\"PeriodicalId\":121135,\"journal\":{\"name\":\"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSACW.2010.72\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSACW.2010.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems
In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.