计算云系统自主管理的异常检测框架

2010 IEEE 34th Annual Computer Software and Applications Conference Workshops Pub Date : 2010-07-19 DOI:10.1109/COMPSACW.2010.72

Derek Smith, Qiang Guan, Song Fu

{"title":"计算云系统自主管理的异常检测框架","authors":"Derek Smith, Qiang Guan, Song Fu","doi":"10.1109/COMPSACW.2010.72","DOIUrl":null,"url":null,"abstract":"In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.","PeriodicalId":121135,"journal":{"name":"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems\",\"authors\":\"Derek Smith, Qiang Guan, Song Fu\",\"doi\":\"10.1109/COMPSACW.2010.72\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.\",\"PeriodicalId\":121135,\"journal\":{\"name\":\"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSACW.2010.72\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSACW.2010.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

摘要

在大规模计算云系统中，组件故障成为常态，而不是异常。故障的发生及其对系统性能和运行成本的影响已成为系统设计者和管理员日益关注的问题。当系统无法正常运行时，与健康相关的数据对于故障排除很有价值。然而，从大量的高维噪声数据中有效地检测异常是一项挑战。传统的手工方法耗时长、容易出错，而且不可扩展。本文提出了一种用于计算云系统异常检测的自主机制。提出了一套自动分析收集数据的技术:数据转换以构建统一的数据格式进行数据分析，特征提取以减少数据大小，无监督学习以检测与其他节点不同的行为。我们在整个研究所的计算云环境中评估了我们的原型实现。结果表明，该机制能够有效地检测故障节点，具有较高的准确率和较低的计算开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems

In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE 34th Annual Computer Software and Applications Conference Workshops

自引率

0.00%

发文量