Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.最新文献

The /spl phi/ accrual failure detector /spl phi/累计故障检测器

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353004

Naohiro Hayashibara, X. Défago, Rami Yared, T. Katayama

{"title":"The /spl phi/ accrual failure detector","authors":"Naohiro Hayashibara, X. Défago, Rami Yared, T. Katayama","doi":"10.1109/RELDIS.2004.1353004","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353004","url":null,"abstract":"The detection of failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously. We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a binary nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale. The principal merit of this approach is that it favors a nearly complete decoupling between application requirements and the monitoring of the environment. In this paper, we describe an implementation of such an accrual failure detector, that we call the /spl phi/ failure detector. The particularity of the /spl phi/ failure detector is that it dynamically adjusts to current network conditions the scale on which the suspicion level is expressed. We analyzed the behavior of our /spl phi/ failure detector over an intercontinental communication link over a week. Our experimental results show that if performs equally well as other known adaptive failure detection mechanisms, with an improved flexibility.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116780980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 121

An hoarding approach for supporting disconnected write operations in mobile environments 一种在移动环境中支持断开连接的写操作的囤积方法

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353028

A. Vora, Z. Tari, P. Bertók

{"title":"An hoarding approach for supporting disconnected write operations in mobile environments","authors":"A. Vora, Z. Tari, P. Bertók","doi":"10.1109/RELDIS.2004.1353028","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353028","url":null,"abstract":"Caching is one technique that reduces costs and improves performance in mobile environments. It also increases availability during temporary, involuntary disconnections. However, our focus is on voluntary, client initiated disconnections, where hoarding can be used to predict data requirements. Existing hoarding approaches ignore conflicts arising out of write sharing and are thus unable to deal with them. However, since conflicts are detrimental to bandwidth utilisation, for scenarios with high write sharing, hoarding techniques need to provide support for sharing in a manner that reduces or avoids conflicts. We propose a hoarding approach for disconnected write operations that focuses on reducing the likelihood of conflicts, arising from write sharing, in a highly concurrent environment. Data that clients might need when disconnected is predicted based on the notion of semantic similarity. To avoid/reduce conflicts, data are first clustered based on their update probabilities. The hoard tree is then created based on the clusters and semantic similarity between data. Simulations show an increase in the cache hit-rate along with an reduction in the total number of conflicts.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116900352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

How to tolerate half less one Byzantine nodes in practical distributed systems 在实际的分布式系统中，如何容忍一个拜占庭节点减少一半

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353018

M. Correia, N. Neves, P. Veríssimo

引用次数: 158

State maintenance and its impact on the performability of multi-tiered Internet services 状态维护及其对多层Internet服务可执行性的影响

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353015

G. Gama, K. Nagaraja, R. Bianchini, R. Martin, Wagner Meira Jr, Thu D. Nguyen

引用次数: 11

The mutable consensus protocol 可变共识协议

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353023

J. Pereira, R. Oliveira

{"title":"The mutable consensus protocol","authors":"J. Pereira, R. Oliveira","doi":"10.1109/RELDIS.2004.1353023","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353023","url":null,"abstract":"In this paper we propose the mutable consensus protocol, a pragmatic and theoretically appealing approach to enhance the performance of distributed consensus. First, an apparently inefficient protocol is developed using the simple stubborn channel abstraction for unreliable message passing. Then, performance is improved by introducing judiciously chosen finite delays in the implementation of channels. Although this does not compromise correctness, which rests on an asynchronous system model, it makes it likely that the transmission of some messages is avoided and thus the message exchange pattern at the network level changes noticeably. By choosing different delays in the underlying stubborn channels, the mutable consensus protocol can actually be made to resemble several different protocols. Besides presenting the mutable consensus protocol and four different mutations, we evaluate in detail the particularly interesting permutation gossip mutation, which allows the protocol to scale gracefully to a large number of processes by balancing the number of messages to be handled by each process with the number of communication steps required to decide. The evaluation is performed using a realistic simulation model which accurately reproduces resource consumption in real systems.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132178460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Slow advances in fault-tolerant real-time distributed computing 容错实时分布式计算进展缓慢

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353009

K. Kim

引用次数: 7

Simple and efficient oracle-based consensus protocols for asynchronous Byzantine systems 用于异步拜占庭系统的简单高效的基于oracle的共识协议

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/TDSC.2005.13

R. Friedman, A. Mostéfaoui, M. Raynal

{"title":"Simple and efficient oracle-based consensus protocols for asynchronous Byzantine systems","authors":"R. Friedman, A. Mostéfaoui, M. Raynal","doi":"10.1109/TDSC.2005.13","DOIUrl":"https://doi.org/10.1109/TDSC.2005.13","url":null,"abstract":"This paper is on the consensus problem in asynchronous distributed systems where (up to f) processes (among n) can exhibit a Byzantine behavior, i.e., can deviate arbitrarily from their specification. A way to solve the consensus problem in such a context consists of enriching the system with additional oracles that are powerful enough to cope with the uncertainty and unpredictability created by the combined effect of Byzantine behavior and asynchrony. Considering two types of such oracles, namely, an oracle that provides processes with random values, and a failure detector oracle, the paper presents two families of Byzantine asynchronous consensus protocols. Two of these protocols are particularly noteworthy: they allow the processes to decide in one communication step in favorable circumstances. The first is a randomized protocol that assumes n > 5f. The second one is a failure detector-based protocol that assumes n > 6f. These protocols are designed to be particularly simple and efficient in terms of communication steps, the number of messages they generate in each step, and the size of messages. So, although they are not optimal in the number of Byzantine processes that can be tolerated, they are particularly efficient when we consider the number of communication steps they require to decide, and the number and size of the messages they use. In that sense, they are practically appealing.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"365 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114098203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

Crash-resilient time-free eventual leadership 抗崩溃、不受时间限制的最终领导

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353022

A. Mostéfaoui, M. Raynal, Corentin Travers

{"title":"Crash-resilient time-free eventual leadership","authors":"A. Mostéfaoui, M. Raynal, Corentin Travers","doi":"10.1109/RELDIS.2004.1353022","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353022","url":null,"abstract":"Leader-based protocols rest on a primitive able to provide the processes with the same unique leader. Such protocols are very common in distributed computing to solve synchronization or coordination problems. Unfortunately, providing such a primitive is far from being trivial in asynchronous distributed systems prone to process crashes. (It is even impossible in fault-prone purely asynchronous systems.) To circumvent this difficulty, several protocols have been proposed that build a leader facility on top of an asynchronous distributed system enriched with synchrony assumptions. This paper consider another approach to build a leader facility, namely, it considers a behavioral property on the flow of messages that are exchanged. This property has the noteworthy feature not to involve timing assumptions. Two protocols based on this time-free property that implement a leader primitive are described. The first one uses potentially unbounded counters, while the second one (which is a little more involved) requires only finite memory. These protocols rely on simple design principles that make them attractive, easy to understand and provably correct.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"236 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121161201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Self checking network protocols: a monitor based approach 自检网络协议:基于监视器的方法

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353000

G. Khanna, Padma Varadharajan, S. Bagchi

{"title":"Self checking network protocols: a monitor based approach","authors":"G. Khanna, Padma Varadharajan, S. Bagchi","doi":"10.1109/RELDIS.2004.1353000","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353000","url":null,"abstract":"The wide deployment of high-speed computer networks has made distributed systems ubiquitous in today's connected world. The machines on which the distributed applications are hosted are heterogeneous in nature, the applications often run legacy code without the availability of their source code, the systems are of very large scales, and often have soft real-time guarantees. In this paper, we target the problem of online detection of disruptions through a generic external entity called Monitor that is able to observe the exchanged messages between the protocol participants and deduce any ongoing disruption by matching against a rule base composed of combinatorial and temporal rules. The Monitor architecture is application neutral, with the rule base making it specific to a protocol. To make the detection infrastructure scalable and dependable, we extend it to a hierarchical Monitor structure. The infrastructure is applied to a streaming video application running on a reliable multicast protocol called TRAM installed on the campus wide network. The evaluation brings out the scalability of the monitor infrastructure and detection coverage under different kinds of faults for the single level and the hierarchical arrangements.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125929959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

XNET: a reliable content-based publish/subscribe system 一个可靠的基于内容的发布/订阅系统

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. Pub Date : 2004-10-18 DOI: 10.1109/RELDIS.2004.1353027

Raphaël Chand, P. Felber

引用次数: 89