Miao Jiang, M. A. Munawar, Thomas Reidemeister, Paul A. S. Ward
{"title":"基于信息理论的复杂软件系统故障自动检测与诊断","authors":"Miao Jiang, M. A. Munawar, Thomas Reidemeister, Paul A. S. Ward","doi":"10.1109/DSN.2009.5270324","DOIUrl":null,"url":null,"abstract":"Management metrics of complex software systems exhibit stable correlations which can enable fault detection and diagnosis. Current approaches use specific analytic forms, typically linear, for modeling correlations. In this paper we use Normalized Mutual Information as a similarity measure to identify clusters of correlated metrics, without knowing the specific form. We show how we can apply the Wilcoxon Rank-Sum test to identify anomalous behaviour. We present two diagnosis algorithms to locate faulty components: RatioScore, based on the Jaccard Coefficient, and SigScore, which incorporates knowledge of component dependencies. We evaluate our mechanisms in the context of a complex enterprise application. Through fault-injection experiments, we show that we can detect 17 out of 22 faults without any false positives. We diagnose the faulty component in the top five anomaly scores 7 times out of 17 using SigScore, which is 40% better than when system structure is ignored.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":"{\"title\":\"Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring\",\"authors\":\"Miao Jiang, M. A. Munawar, Thomas Reidemeister, Paul A. S. Ward\",\"doi\":\"10.1109/DSN.2009.5270324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Management metrics of complex software systems exhibit stable correlations which can enable fault detection and diagnosis. Current approaches use specific analytic forms, typically linear, for modeling correlations. In this paper we use Normalized Mutual Information as a similarity measure to identify clusters of correlated metrics, without knowing the specific form. We show how we can apply the Wilcoxon Rank-Sum test to identify anomalous behaviour. We present two diagnosis algorithms to locate faulty components: RatioScore, based on the Jaccard Coefficient, and SigScore, which incorporates knowledge of component dependencies. We evaluate our mechanisms in the context of a complex enterprise application. Through fault-injection experiments, we show that we can detect 17 out of 22 faults without any false positives. We diagnose the faulty component in the top five anomaly scores 7 times out of 17 using SigScore, which is 40% better than when system structure is ignored.\",\"PeriodicalId\":376982,\"journal\":{\"name\":\"2009 IEEE/IFIP International Conference on Dependable Systems & Networks\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"45\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE/IFIP International Conference on Dependable Systems & Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN.2009.5270324\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2009.5270324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring
Management metrics of complex software systems exhibit stable correlations which can enable fault detection and diagnosis. Current approaches use specific analytic forms, typically linear, for modeling correlations. In this paper we use Normalized Mutual Information as a similarity measure to identify clusters of correlated metrics, without knowing the specific form. We show how we can apply the Wilcoxon Rank-Sum test to identify anomalous behaviour. We present two diagnosis algorithms to locate faulty components: RatioScore, based on the Jaccard Coefficient, and SigScore, which incorporates knowledge of component dependencies. We evaluate our mechanisms in the context of a complex enterprise application. Through fault-injection experiments, we show that we can detect 17 out of 22 faults without any false positives. We diagnose the faulty component in the top five anomaly scores 7 times out of 17 using SigScore, which is 40% better than when system structure is ignored.