{"title":"Analysis of failures in the Tandem NonStop-UX Operating System","authors":"Anshuman Thakur, R. Iyer, L. Young, Inhwan Lee","doi":"10.1109/ISSRE.1995.497642","DOIUrl":null,"url":null,"abstract":"The paper presents results from an investigation of failures in several releases of Tandem's NonStop-UX Operating System, which is based on Unix System V. The analysis covers software failures from the field and failures reported by Tandem's test center. Fault classification is based on the status of the reported failures, the detection point of the errors in the operating system code, the panic message generated by the systems, the module that was found to be faulty, and the type of programming mistake. This classification reveals which modules in the operating system generate the most faults and the modules in which most errors are detected. We also present distributions of the failure and repair times including inter arrival time of unique failures and time between duplicate failures. These distributions, unlike generic time distributions, such as time between failures, help characterize the software quality. Distribution of the repair times emphasizes the repair process and the factors influencing repair. Distribution of up time of the systems before the panic reveals the factors triggering the panic.","PeriodicalId":408394,"journal":{"name":"Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE.1995.497642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
The paper presents results from an investigation of failures in several releases of Tandem's NonStop-UX Operating System, which is based on Unix System V. The analysis covers software failures from the field and failures reported by Tandem's test center. Fault classification is based on the status of the reported failures, the detection point of the errors in the operating system code, the panic message generated by the systems, the module that was found to be faulty, and the type of programming mistake. This classification reveals which modules in the operating system generate the most faults and the modules in which most errors are detected. We also present distributions of the failure and repair times including inter arrival time of unique failures and time between duplicate failures. These distributions, unlike generic time distributions, such as time between failures, help characterize the software quality. Distribution of the repair times emphasizes the repair process and the factors influencing repair. Distribution of up time of the systems before the panic reveals the factors triggering the panic.
本文介绍了对基于Unix System v的Tandem NonStop-UX操作系统的几个版本的故障进行调查的结果,分析了来自现场的软件故障和由Tandem测试中心报告的故障。故障分类是根据故障报告的状态、操作系统代码错误的检出点、系统产生的panic消息、发现故障的模块、编程错误的类型进行分类。这种分类揭示了操作系统中哪些模块产生的错误最多,哪些模块检测到的错误最多。我们还给出了故障和修复时间的分布,包括唯一故障的间隔到达时间和重复故障之间的时间。这些分布,不像一般的时间分布(如故障间隔时间),有助于描述软件质量。修理次数的分布强调修理过程和影响修理的因素。恐慌发生前系统运行时间的分布揭示了引发恐慌的因素。