{"title":"大规模分布式系统问题检测的状态机方法","authors":"Kewei Sun, J. Qiu, Ying Li, Ying Chen, Weixing Ji","doi":"10.1109/NOMS.2008.4575150","DOIUrl":null,"url":null,"abstract":"Efficient problem detection methods play an important role in system management. In this paper, a formal method is described for problem detection in large scale and distributed enterprise IT environment. Events from distributed system components are collected, filtered and correlated. Leveraging these correlated events, the behavior of a distributed system is presented as a problem detection state machine (PDSM). PDSM is built up automatically from system logs without any specification of the target system. This approach combines logs from multi-sources and does not require any human involved or experimental instructions. It is generally applicable to a large class of distributed systems. Experimental results show that the implementation of PDSM performs problem detection efficiently in typical distributed enterprise systems.","PeriodicalId":368139,"journal":{"name":"NOMS 2008 - 2008 IEEE Network Operations and Management Symposium","volume":"35 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A state machine approach for problem detection in large-scale distributed system\",\"authors\":\"Kewei Sun, J. Qiu, Ying Li, Ying Chen, Weixing Ji\",\"doi\":\"10.1109/NOMS.2008.4575150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient problem detection methods play an important role in system management. In this paper, a formal method is described for problem detection in large scale and distributed enterprise IT environment. Events from distributed system components are collected, filtered and correlated. Leveraging these correlated events, the behavior of a distributed system is presented as a problem detection state machine (PDSM). PDSM is built up automatically from system logs without any specification of the target system. This approach combines logs from multi-sources and does not require any human involved or experimental instructions. It is generally applicable to a large class of distributed systems. Experimental results show that the implementation of PDSM performs problem detection efficiently in typical distributed enterprise systems.\",\"PeriodicalId\":368139,\"journal\":{\"name\":\"NOMS 2008 - 2008 IEEE Network Operations and Management Symposium\",\"volume\":\"35 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NOMS 2008 - 2008 IEEE Network Operations and Management Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NOMS.2008.4575150\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NOMS 2008 - 2008 IEEE Network Operations and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOMS.2008.4575150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A state machine approach for problem detection in large-scale distributed system
Efficient problem detection methods play an important role in system management. In this paper, a formal method is described for problem detection in large scale and distributed enterprise IT environment. Events from distributed system components are collected, filtered and correlated. Leveraging these correlated events, the behavior of a distributed system is presented as a problem detection state machine (PDSM). PDSM is built up automatically from system logs without any specification of the target system. This approach combines logs from multi-sources and does not require any human involved or experimental instructions. It is generally applicable to a large class of distributed systems. Experimental results show that the implementation of PDSM performs problem detection efficiently in typical distributed enterprise systems.