{"title":"Estimating System Availability And Reliability","authors":"A. Goyal","doi":"10.1109/WSC.1989.718688","DOIUrl":null,"url":null,"abstract":"This paper deals with methods for constructing and solving large Markov chain models of computer system availability and reliability. A set of powerful high level modeling constructs is discussed that can be used to represent the failure and repair behavior of the components interactions. If time independent failure and repair rates are assumed then a time homogeneous continuous time Markov chain can be constructed automatically from the modeling constructs used to decribe the system. Since, the size of Markov chains grows exponentially with the number of components modeled, simulation appears to be a practical way for solving models of large systems. However, the standard simulation takes very long simulation runs to estimate availability and reliability measures because the system failure event is a rare event. Therefore, variance reduction techniques which can aid in computing rare-event probabilities quickly are of interest. Specifically, the Importance Sampling technique has been found to be most useful. The modeling language and the simulation methods discussed in this paper have been implemented in a program package called the System Availability Estimator (SAVE).","PeriodicalId":319104,"journal":{"name":"1989 Winter Simulation Conference Proceedings","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1989 Winter Simulation Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WSC.1989.718688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper deals with methods for constructing and solving large Markov chain models of computer system availability and reliability. A set of powerful high level modeling constructs is discussed that can be used to represent the failure and repair behavior of the components interactions. If time independent failure and repair rates are assumed then a time homogeneous continuous time Markov chain can be constructed automatically from the modeling constructs used to decribe the system. Since, the size of Markov chains grows exponentially with the number of components modeled, simulation appears to be a practical way for solving models of large systems. However, the standard simulation takes very long simulation runs to estimate availability and reliability measures because the system failure event is a rare event. Therefore, variance reduction techniques which can aid in computing rare-event probabilities quickly are of interest. Specifically, the Importance Sampling technique has been found to be most useful. The modeling language and the simulation methods discussed in this paper have been implemented in a program package called the System Availability Estimator (SAVE).