{"title":"Research on Online Failure Prediction Model and Status Pretreatment Method for Exascale System","authors":"Hao Zhou, Yanhuang Jiang","doi":"10.1109/CyberC.2011.68","DOIUrl":null,"url":null,"abstract":"The reliability issue of Exascale system is extremely serious. Traditional passive fault-tolerant methods, such as rollback-recovery, can not fully guarantee system reliability any more because of their large executing overhead and long recovering duration. Active fault tolerance is expected to become another important fault-tolerant approach for Exascale system. Focusing on system failure prediction, which is one key step of active fault tolerance, we construct online failure prediction model and research on the effective method of system status pretreatment. In order to improve the accuracy and real-time feature of current methods, the proposed Improved Adaptive Semantic Filter (IASF) method processes the latest system logs regularly, filtering useless information out of them according to their semantics. Adopting the main idea of Vector Space Model (VSM), IASF method creates Event Vector corresponding to each log record. By calculating the cosine of vectorial angle, it evaluates the semantics correlation between different log records, and then executes temporal and spatial redundant filter considering the burst feature of log records. IASF method is insensitive to the type of system log and does not introduce any expert system or domain knowledge. The experiment result shows that system can eliminate about 99.6% of useless log records after executing IASF method.","PeriodicalId":227472,"journal":{"name":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2011.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The reliability issue of Exascale system is extremely serious. Traditional passive fault-tolerant methods, such as rollback-recovery, can not fully guarantee system reliability any more because of their large executing overhead and long recovering duration. Active fault tolerance is expected to become another important fault-tolerant approach for Exascale system. Focusing on system failure prediction, which is one key step of active fault tolerance, we construct online failure prediction model and research on the effective method of system status pretreatment. In order to improve the accuracy and real-time feature of current methods, the proposed Improved Adaptive Semantic Filter (IASF) method processes the latest system logs regularly, filtering useless information out of them according to their semantics. Adopting the main idea of Vector Space Model (VSM), IASF method creates Event Vector corresponding to each log record. By calculating the cosine of vectorial angle, it evaluates the semantics correlation between different log records, and then executes temporal and spatial redundant filter considering the burst feature of log records. IASF method is insensitive to the type of system log and does not introduce any expert system or domain knowledge. The experiment result shows that system can eliminate about 99.6% of useless log records after executing IASF method.