{"title":"Software Aging and Software Rejuvenation: Keynote","authors":"K. Trivedi","doi":"10.1145/3297663.3310290","DOIUrl":null,"url":null,"abstract":"The study of software failures has now become more important since it has been recognized that computer system outages are more due to software faults than due to hardware faults. The phenome- non of \"software aging\", in which the state of the software system degrades with time, has been reported in widely used software and also in high-availability and safety-critical systems. The primary causes of this degradation are the exhaustion of operating system resources, data corruption and numerical error accumulation. This may eventually lead to performance degradation of the software system or crash/hang failure or both. To counteract this phenome- non, a proactive approach to fault management, called \"software rejuvenation\" has been proposed. This essentially involves grace- fully terminating an application or a system and restarting it in a clean internal state. This process removes the accumulated errors and frees up operating system resources. This method therefore avoids or postpones unplanned and potentially expensive system outages due to software aging. In this talk, we discuss methods of evaluating the effectiveness of proactive fault management in operational software systems and determining optimal times to perform rejuvenation.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3297663.3310290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The study of software failures has now become more important since it has been recognized that computer system outages are more due to software faults than due to hardware faults. The phenome- non of "software aging", in which the state of the software system degrades with time, has been reported in widely used software and also in high-availability and safety-critical systems. The primary causes of this degradation are the exhaustion of operating system resources, data corruption and numerical error accumulation. This may eventually lead to performance degradation of the software system or crash/hang failure or both. To counteract this phenome- non, a proactive approach to fault management, called "software rejuvenation" has been proposed. This essentially involves grace- fully terminating an application or a system and restarting it in a clean internal state. This process removes the accumulated errors and frees up operating system resources. This method therefore avoids or postpones unplanned and potentially expensive system outages due to software aging. In this talk, we discuss methods of evaluating the effectiveness of proactive fault management in operational software systems and determining optimal times to perform rejuvenation.