An approach towards benchmarking of fault-tolerant commercial systems

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI:10.1109/FTCS.1996.534616

T. Tsai, R. Iyer, Doug Jewitt

引用次数: 130

Abstract

This paper presents a benchmark for dependable systems. The benchmark consists of two metrics, number of catastrophic incidents and performance degradation, which are obtained by a tool that (1) generates synthetic workloads that produce a high level of CPU, memory, and I/O activity and (2) injects CPU, memory, and I/O faults according to an injection strategy. The benchmark has been installed on two TMR-based prototype machines: TMR Prototype A and TMR Prototype B. An implementation for a third prototype, is based on a duplex architecture, is in progress. The results demonstrate the utility of the benchmark in comparing the system-level fault tolerance of these machines and in providing insight into their design. In particular the benchmark shows that Prototype B suffers fewer catastrophic incidents than Prototype A under the same workload conditions and fault injection method. However Prototype B also suffers more performance degradation in the presence of faults, which might be an important concern for time-critical applications.

查看原文本刊更多论文

对容错商业系统进行基准测试的方法

本文提出了一个可靠系统的基准。基准测试由两个指标组成，即灾难性事件的数量和性能下降，这两个指标是由一个工具获得的，该工具(1)生成生成高水平CPU、内存和I/O活动的合成工作负载，以及(2)根据注入策略注入CPU、内存和I/O故障。该基准测试已经安装在两台基于TMR的原型机上:TMR prototype A和TMR prototype b。第三台基于双工架构的原型机的实现正在进行中。结果证明了基准测试在比较这些机器的系统级容错性和深入了解它们的设计方面的实用性。特别是基准测试表明，在相同的工作负载条件和故障注入方法下，原型B遭受的灾难性事件比原型A少。然而，原型B在出现故障时也会遭受更多的性能下降，这对于时间关键型应用程序来说可能是一个重要的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of Annual Symposium on Fault Tolerant Computing

自引率

0.00%

发文量