{"title":"Optimal mission abort and selective replacement policies for multi-state systems","authors":"Xian Zhao , Zuheng Lv , Qingan Qiu , Yaguang Wu","doi":"10.1016/j.ress.2025.111366","DOIUrl":null,"url":null,"abstract":"<div><div>To mitigate the failure risk in safety-critical systems, it is beneficial to implement mission abort and rescue procedures when specific malfunction conditions are identified. Existing mission abort models predominantly focus on multi-state systems with binary-state components, often operating under the assumption that all failed components will be completely replaced after each rescue operation. However, many real-world engineering systems employ multi-state components, where replacing all failed components may not be the optimal approach due to constraints on replacement resources. Therefore, the design of effective mission abort and selective replacement policies for systems with multi-state components becomes imperative. Additionally, existing models for selective replacement primarily focus on the condition of system degradation, often overlooking the progress of missions, which can lead to suboptimal maintenance decisions, as it does not account for how mission progress and system performance interact with the demand for component replacement. This paper introduces dynamic condition-based mission abort and selective replacement policies for <em>k</em>-out-of-n: <em>F</em> systems with multi-state components, which dynamically assess the condition of system components’ state and mission execution. Mission success probability and system survivability are derived by employing recursive and discretization algorithms. We develop optimization models aimed at maximizing these probabilities while minimizing expected costs associated with maintenance and replacement actions. A case study involving a cloud computing system illustrates the advantages of the proposed policies, demonstrating their effectiveness in comparison to existing alternatives.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"264 ","pages":"Article 111366"},"PeriodicalIF":11.0000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832025005678","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
To mitigate the failure risk in safety-critical systems, it is beneficial to implement mission abort and rescue procedures when specific malfunction conditions are identified. Existing mission abort models predominantly focus on multi-state systems with binary-state components, often operating under the assumption that all failed components will be completely replaced after each rescue operation. However, many real-world engineering systems employ multi-state components, where replacing all failed components may not be the optimal approach due to constraints on replacement resources. Therefore, the design of effective mission abort and selective replacement policies for systems with multi-state components becomes imperative. Additionally, existing models for selective replacement primarily focus on the condition of system degradation, often overlooking the progress of missions, which can lead to suboptimal maintenance decisions, as it does not account for how mission progress and system performance interact with the demand for component replacement. This paper introduces dynamic condition-based mission abort and selective replacement policies for k-out-of-n: F systems with multi-state components, which dynamically assess the condition of system components’ state and mission execution. Mission success probability and system survivability are derived by employing recursive and discretization algorithms. We develop optimization models aimed at maximizing these probabilities while minimizing expected costs associated with maintenance and replacement actions. A case study involving a cloud computing system illustrates the advantages of the proposed policies, demonstrating their effectiveness in comparison to existing alternatives.
期刊介绍:
Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.