{"title":"UNIX服务器的错误和故障分析","authors":"Ronjeet Lal, G. Choi","doi":"10.1109/HASE.1998.731618","DOIUrl":null,"url":null,"abstract":"This paper presents a measurement-based dependability study of a UNIX server. The event logs of a UNIX server are collected to form the dependability data basis. Message logs spanning approximately eleven months were collected for this study. The event log data are classified and categorized to calculate parameters such as MTBF and availability. Component analysis is also performed to identify modules that are prone to errors in the system. Next, the system error activity proceeding each system failure is analyzed to identify error patterns that may be precursors of the observed failure events. Lastly, the error/failure results from the measurement are reviewed in the perspective of the fault/error assumptions made in several popular fault injection studies.","PeriodicalId":340424,"journal":{"name":"Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Error and failure analysis of a UNIX server\",\"authors\":\"Ronjeet Lal, G. Choi\",\"doi\":\"10.1109/HASE.1998.731618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a measurement-based dependability study of a UNIX server. The event logs of a UNIX server are collected to form the dependability data basis. Message logs spanning approximately eleven months were collected for this study. The event log data are classified and categorized to calculate parameters such as MTBF and availability. Component analysis is also performed to identify modules that are prone to errors in the system. Next, the system error activity proceeding each system failure is analyzed to identify error patterns that may be precursors of the observed failure events. Lastly, the error/failure results from the measurement are reviewed in the perspective of the fault/error assumptions made in several popular fault injection studies.\",\"PeriodicalId\":340424,\"journal\":{\"name\":\"Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HASE.1998.731618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HASE.1998.731618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents a measurement-based dependability study of a UNIX server. The event logs of a UNIX server are collected to form the dependability data basis. Message logs spanning approximately eleven months were collected for this study. The event log data are classified and categorized to calculate parameters such as MTBF and availability. Component analysis is also performed to identify modules that are prone to errors in the system. Next, the system error activity proceeding each system failure is analyzed to identify error patterns that may be precursors of the observed failure events. Lastly, the error/failure results from the measurement are reviewed in the perspective of the fault/error assumptions made in several popular fault injection studies.