{"title":"计算机集群中高可用性计算的主动故障管理","authors":"Ziming Zhang, Song Fu","doi":"10.1109/CSO.2010.71","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a framework for autonomic failure management with hierarchical failure prediction functionality for coalition clusters. It analyzes node, cluster and system wide failure behaviors and forecasts the prospective failure occurrences based on quantified failure dynamics. Failure correlations are inspected by the predictor. Experimental results in a computational grid on campus show the offline and online predictions by our predictors accurately forecast the failure trend and capture failure correlations in a coalition clusters environment.","PeriodicalId":427481,"journal":{"name":"2010 Third International Joint Conference on Computational Science and Optimization","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Proactive Failure Management for High Availability Computing in Computer Clusters\",\"authors\":\"Ziming Zhang, Song Fu\",\"doi\":\"10.1109/CSO.2010.71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a framework for autonomic failure management with hierarchical failure prediction functionality for coalition clusters. It analyzes node, cluster and system wide failure behaviors and forecasts the prospective failure occurrences based on quantified failure dynamics. Failure correlations are inspected by the predictor. Experimental results in a computational grid on campus show the offline and online predictions by our predictors accurately forecast the failure trend and capture failure correlations in a coalition clusters environment.\",\"PeriodicalId\":427481,\"journal\":{\"name\":\"2010 Third International Joint Conference on Computational Science and Optimization\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Third International Joint Conference on Computational Science and Optimization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSO.2010.71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Joint Conference on Computational Science and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSO.2010.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Proactive Failure Management for High Availability Computing in Computer Clusters
In this paper, we propose a framework for autonomic failure management with hierarchical failure prediction functionality for coalition clusters. It analyzes node, cluster and system wide failure behaviors and forecasts the prospective failure occurrences based on quantified failure dynamics. Failure correlations are inspected by the predictor. Experimental results in a computational grid on campus show the offline and online predictions by our predictors accurately forecast the failure trend and capture failure correlations in a coalition clusters environment.