量化故障事件的时间和空间相关性的前瞻性管理

S. Fu, Chengzhong Xu
{"title":"量化故障事件的时间和空间相关性的前瞻性管理","authors":"S. Fu, Chengzhong Xu","doi":"10.1109/SRDS.2007.18","DOIUrl":null,"url":null,"abstract":"Networked computing systems continue to grow in scale and in the complexity of their components and interactions. Component failures become norms instead of exceptions in these environments. Moreover, failure events exhibit strong correlations in time and space domain. In this paper, we develop a spherical covariance model with an adjustable timescale parameter to quantify the temporal correlation and a stochastic model to characterize spatial correlation. The models are further extended to take into account the information of application allocation to discover more correlations among failure instances. We cluster failure events based on their correlations and predict their future occurrences. Experimental results on a production coalition system, the Wayne State Grid, show the offline and online predictions by our predicting system can forecast 72.7% to 85.3% of the failure occurrences and capture failure correlations in cluster coalition environment.","PeriodicalId":224921,"journal":{"name":"2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":"{\"title\":\"Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management\",\"authors\":\"S. Fu, Chengzhong Xu\",\"doi\":\"10.1109/SRDS.2007.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Networked computing systems continue to grow in scale and in the complexity of their components and interactions. Component failures become norms instead of exceptions in these environments. Moreover, failure events exhibit strong correlations in time and space domain. In this paper, we develop a spherical covariance model with an adjustable timescale parameter to quantify the temporal correlation and a stochastic model to characterize spatial correlation. The models are further extended to take into account the information of application allocation to discover more correlations among failure instances. We cluster failure events based on their correlations and predict their future occurrences. Experimental results on a production coalition system, the Wayne State Grid, show the offline and online predictions by our predicting system can forecast 72.7% to 85.3% of the failure occurrences and capture failure correlations in cluster coalition environment.\",\"PeriodicalId\":224921,\"journal\":{\"name\":\"2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"72\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SRDS.2007.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2007.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 72

摘要

网络计算系统在规模和其组件和交互的复杂性方面继续增长。在这些环境中,组件故障成为规范而不是异常。此外,失效事件在时间和空间上表现出很强的相关性。在本文中,我们建立了一个具有可调时间尺度参数的球形协方差模型来量化时间相关性,并建立了一个随机模型来表征空间相关性。在此基础上对模型进行了扩展,考虑了应用程序分配信息,以发现故障实例之间更多的相关性。我们根据它们的相关性对故障事件进行聚类,并预测它们未来的发生情况。在生产联盟系统Wayne State Grid上的实验结果表明,该预测系统的离线和在线预测可以预测集群联盟环境下72.7% ~ 85.3%的故障发生,并捕获故障相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management
Networked computing systems continue to grow in scale and in the complexity of their components and interactions. Component failures become norms instead of exceptions in these environments. Moreover, failure events exhibit strong correlations in time and space domain. In this paper, we develop a spherical covariance model with an adjustable timescale parameter to quantify the temporal correlation and a stochastic model to characterize spatial correlation. The models are further extended to take into account the information of application allocation to discover more correlations among failure instances. We cluster failure events based on their correlations and predict their future occurrences. Experimental results on a production coalition system, the Wayne State Grid, show the offline and online predictions by our predicting system can forecast 72.7% to 85.3% of the failure occurrences and capture failure correlations in cluster coalition environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信