Characterizing the Dependability of Distributed Storage Systems Using a Two-Layer Hidden Markov Model-Based Approach

2010 IEEE Fifth International Conference on Networking, Architecture, and Storage Pub Date : 2010-07-15 DOI:10.1109/NAS.2010.28

Xin Chen, James Warren, Fang Han, Xubin He

引用次数: 3

Abstract

Nowadays, dependability is of paramount importance in modern distributed storage systems. A challenging issue to deploy a storage system with certain dependability requirements or improve existing systems' dependability is how to comprehensively and efficiently characterize the dependability of those systems. In this paper, we present a two-layer Hidden Markov Model (HMM) to characterize the dependability of a distributed storage system, focusing on the layer of parallel file system. By training the model with observable measurements under faulty scenarios, such as I/O performance, we quantify the system dependability via a tuple of state transition probability, service degradation, and fault latency under those scenarios. Our experimental results on a distributed storage system with PVFS (Parallel Virtual File System) demonstrate the effectiveness of our HMM-based approach, which efficiently captures the behavior patterns of the target system under disk faults and memory overusage.

查看原文本刊更多论文

基于两层隐马尔可夫模型的分布式存储系统可靠性表征

在现代分布式存储系统中，可靠性是最重要的。如何对具有一定可靠性要求的存储系统进行部署或提高现有系统的可靠性，是一个具有挑战性的问题。本文以并行文件系统层为研究对象，提出了一种描述分布式存储系统可靠性的二层隐马尔可夫模型(HMM)。通过使用故障场景(如I/O性能)下的可观察测量来训练模型，我们通过这些场景下的状态转移概率、服务退化和故障延迟的元组来量化系统可靠性。我们在PVFS(并行虚拟文件系统)分布式存储系统上的实验结果证明了我们基于hmm的方法的有效性，该方法可以有效地捕获目标系统在磁盘故障和内存过度使用下的行为模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE Fifth International Conference on Networking, Architecture, and Storage

自引率

0.00%

发文量