计算云系统自主管理的异常检测框架

Derek Smith, Qiang Guan, Song Fu
{"title":"计算云系统自主管理的异常检测框架","authors":"Derek Smith, Qiang Guan, Song Fu","doi":"10.1109/COMPSACW.2010.72","DOIUrl":null,"url":null,"abstract":"In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.","PeriodicalId":121135,"journal":{"name":"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems\",\"authors\":\"Derek Smith, Qiang Guan, Song Fu\",\"doi\":\"10.1109/COMPSACW.2010.72\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.\",\"PeriodicalId\":121135,\"journal\":{\"name\":\"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSACW.2010.72\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 34th Annual Computer Software and Applications Conference Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSACW.2010.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

摘要

在大规模计算云系统中,组件故障成为常态,而不是异常。故障的发生及其对系统性能和运行成本的影响已成为系统设计者和管理员日益关注的问题。当系统无法正常运行时,与健康相关的数据对于故障排除很有价值。然而,从大量的高维噪声数据中有效地检测异常是一项挑战。传统的手工方法耗时长、容易出错,而且不可扩展。本文提出了一种用于计算云系统异常检测的自主机制。提出了一套自动分析收集数据的技术:数据转换以构建统一的数据格式进行数据分析,特征提取以减少数据大小,无监督学习以检测与其他节点不同的行为。我们在整个研究所的计算云环境中评估了我们的原型实现。结果表明,该机制能够有效地检测故障节点,具有较高的准确率和较低的计算开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems
In large-scale compute cloud systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a system fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively detect anomalies from the voluminous amount of noisy, high-dimensional data. The traditional manual approach is time-consuming, error-prone, and not scalable. In this paper, we present an autonomic mechanism for anomaly detection in compute cloud systems. A set of techniques is presented to automatically analyze collected data: data transformation to construct a uniform data format for data analysis, feature extraction to reduce data size, and unsupervised learning to detect the nodes acting differently from others. We evaluate our prototype implementation on an institute-wide compute cloud environment. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信