{"title":"强化学习在软件复兴中的应用","authors":"H. Okamura, T. Dohi","doi":"10.1109/ISADS.2011.92","DOIUrl":null,"url":null,"abstract":"Software rejuvenation is a preventive and proactive maintenance solution that is particularly useful for counteracting the phenomenon of software aging. Hence, it should be ideally triggered adaptively without the complete knowledge on system failure (degradation) time distribution in operational phase. In this paper we consider an operational software system with multiple degradation levels and derive the optimal software rejuvenation policy maximizing the steady-state system availability, via the semi-Markov decision process. We develop a statistically non-parametric algorithm to estimate the optimal software rejuvenation schedule. Then, the reinforcement learning algorithm, called Q learning, is used for developing an on-line adaptive algorithm. A numerical example is presented to investigate asymptotic behavior of the resulting on-line adaptive algorithm.","PeriodicalId":221833,"journal":{"name":"2011 Tenth International Symposium on Autonomous Decentralized Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Application of Reinforcement Learning to Software Rejuvenation\",\"authors\":\"H. Okamura, T. Dohi\",\"doi\":\"10.1109/ISADS.2011.92\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software rejuvenation is a preventive and proactive maintenance solution that is particularly useful for counteracting the phenomenon of software aging. Hence, it should be ideally triggered adaptively without the complete knowledge on system failure (degradation) time distribution in operational phase. In this paper we consider an operational software system with multiple degradation levels and derive the optimal software rejuvenation policy maximizing the steady-state system availability, via the semi-Markov decision process. We develop a statistically non-parametric algorithm to estimate the optimal software rejuvenation schedule. Then, the reinforcement learning algorithm, called Q learning, is used for developing an on-line adaptive algorithm. A numerical example is presented to investigate asymptotic behavior of the resulting on-line adaptive algorithm.\",\"PeriodicalId\":221833,\"journal\":{\"name\":\"2011 Tenth International Symposium on Autonomous Decentralized Systems\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Tenth International Symposium on Autonomous Decentralized Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISADS.2011.92\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Tenth International Symposium on Autonomous Decentralized Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISADS.2011.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Application of Reinforcement Learning to Software Rejuvenation
Software rejuvenation is a preventive and proactive maintenance solution that is particularly useful for counteracting the phenomenon of software aging. Hence, it should be ideally triggered adaptively without the complete knowledge on system failure (degradation) time distribution in operational phase. In this paper we consider an operational software system with multiple degradation levels and derive the optimal software rejuvenation policy maximizing the steady-state system availability, via the semi-Markov decision process. We develop a statistically non-parametric algorithm to estimate the optimal software rejuvenation schedule. Then, the reinforcement learning algorithm, called Q learning, is used for developing an on-line adaptive algorithm. A numerical example is presented to investigate asymptotic behavior of the resulting on-line adaptive algorithm.