{"title":"Software defects and their impact on system availability-a study of field failures in operating systems","authors":"M. Sullivan, R. Chillarege","doi":"10.1109/FTCS.1991.146625","DOIUrl":null,"url":null,"abstract":"Defects reported between 1986 and 1989 in the MVS operating system are studied in order to gain the insight needed to provide a clear strategy for avoiding or tolerating them. Typical defects (regular) are compared to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix. It is shown that the impact of an overlay defect is, on average, much higher than that of a regular defect, that boundary conditions and allocation management are the major causes of overlay defects, not timing, and that most overlays are small and corrupt data near the data that the programmer meant to update. Further analysis is provided on defects in fixes to other defects, failure symptoms, and the impact of defects on customers.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"300 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"398","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1991.146625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 398
Abstract
Defects reported between 1986 and 1989 in the MVS operating system are studied in order to gain the insight needed to provide a clear strategy for avoiding or tolerating them. Typical defects (regular) are compared to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix. It is shown that the impact of an overlay defect is, on average, much higher than that of a regular defect, that boundary conditions and allocation management are the major causes of overlay defects, not timing, and that most overlays are small and corrupt data near the data that the programmer meant to update. Further analysis is provided on defects in fixes to other defects, failure symptoms, and the impact of defects on customers.<>