Estimating System Availability And Reliability

1989 Winter Simulation Conference Proceedings Pub Date : 1989-12-04 DOI:10.1109/WSC.1989.718688

A. Goyal

{"title":"Estimating System Availability And Reliability","authors":"A. Goyal","doi":"10.1109/WSC.1989.718688","DOIUrl":null,"url":null,"abstract":"This paper deals with methods for constructing and solving large Markov chain models of computer system availability and reliability. A set of powerful high level modeling constructs is discussed that can be used to represent the failure and repair behavior of the components interactions. If time independent failure and repair rates are assumed then a time homogeneous continuous time Markov chain can be constructed automatically from the modeling constructs used to decribe the system. Since, the size of Markov chains grows exponentially with the number of components modeled, simulation appears to be a practical way for solving models of large systems. However, the standard simulation takes very long simulation runs to estimate availability and reliability measures because the system failure event is a rare event. Therefore, variance reduction techniques which can aid in computing rare-event probabilities quickly are of interest. Specifically, the Importance Sampling technique has been found to be most useful. The modeling language and the simulation methods discussed in this paper have been implemented in a program package called the System Availability Estimator (SAVE).","PeriodicalId":319104,"journal":{"name":"1989 Winter Simulation Conference Proceedings","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1989 Winter Simulation Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WSC.1989.718688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper deals with methods for constructing and solving large Markov chain models of computer system availability and reliability. A set of powerful high level modeling constructs is discussed that can be used to represent the failure and repair behavior of the components interactions. If time independent failure and repair rates are assumed then a time homogeneous continuous time Markov chain can be constructed automatically from the modeling constructs used to decribe the system. Since, the size of Markov chains grows exponentially with the number of components modeled, simulation appears to be a practical way for solving models of large systems. However, the standard simulation takes very long simulation runs to estimate availability and reliability measures because the system failure event is a rare event. Therefore, variance reduction techniques which can aid in computing rare-event probabilities quickly are of interest. Specifically, the Importance Sampling technique has been found to be most useful. The modeling language and the simulation methods discussed in this paper have been implemented in a program package called the System Availability Estimator (SAVE).

查看原文本刊更多论文

评估系统的可用性和可靠性

本文讨论了计算机系统可用性和可靠性的大型马尔可夫链模型的构造和求解方法。讨论了一组功能强大的高级建模构造，可用于表示组件交互的故障和修复行为。如果假设故障和修复率与时间无关，则可以从用于描述系统的建模构造中自动构造时间齐次连续时间马尔可夫链。由于马尔可夫链的大小随着建模组件的数量呈指数增长，模拟似乎是解决大型系统模型的实用方法。然而，标准模拟需要很长的模拟运行时间来估计可用性和可靠性度量，因为系统故障事件是罕见的事件。因此，能够帮助快速计算罕见事件概率的方差缩减技术引起了人们的兴趣。具体来说，重要性抽样技术已经被发现是最有用的。本文讨论的建模语言和仿真方法已经在一个名为系统可用性估计器(SAVE)的程序包中实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

1989 Winter Simulation Conference Proceedings

自引率

0.00%

发文量