Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.最新文献

筛选
英文 中文
Progress in real-time fault tolerance 实时容错的进展
P. Melliar-Smith, L. Moser
{"title":"Progress in real-time fault tolerance","authors":"P. Melliar-Smith, L. Moser","doi":"10.1109/RELDIS.2004.1353010","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353010","url":null,"abstract":"This paper discusses progress in the field of real-time fault tolerance. In particular, it considers synchronous vs. asynchronous fault tolerance designs, maintaining replica consistency, alternative fault tolerance strategies, including checkpoint restoration, transactions, and consistent replay, and custom vs. generic fault tolerance.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133334044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Skewed checkpointing for tolerating multi-node failures 允许多节点故障的倾斜检查点
Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya
{"title":"Skewed checkpointing for tolerating multi-node failures","authors":"Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya","doi":"10.1109/RELDIS.2004.1353012","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353012","url":null,"abstract":"Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated checkpointing is a promising way to maintain high availability because the computing nodes are tightly connected to one another. However, as the number of computing nodes gets larger, the probability of multi-node failures increases. To tolerate multi-node failures, a large degree of redundancy is required in checkpointing, but this leads to performance degradation. Thus, we propose a new coordinated checkpointing called skewed checkpointing. In this method, checkpointing is skewed every time. Although each checkpointing itself contains only one degree of redundancy, this skewed checkpointing ensures /spl lfloor/log/sub 2/N/spl rfloor/ degrees of redundancy when the number of nodes is N. In this paper, we present the proposed method and an analysis of the performance overhead. Then, this method is applied to a cluster system and compared with other conventional checkpointing schemes. The results reveal the superiority of our method, especially for large cluster systems.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The design and evaluation of a defense system for Internet worms 网络蠕虫防御系统的设计与评价
R. Scandariato, J. Knight
{"title":"The design and evaluation of a defense system for Internet worms","authors":"R. Scandariato, J. Knight","doi":"10.1109/RELDIS.2004.1353017","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353017","url":null,"abstract":"Many areas of society have become heavily dependent on services such as transportation facilities, utilities and so on that are implemented in part by large numbers of computers and communications links. Both past incidents and research studies show that a well-engineered Internet worm can disable such systems in a fairly simple way and, most notably, in a matter of a few minutes. This indicates the need for defenses against worms but their speed rules out the possibility of manually countering worm outbreaks. We present a platform that emulates the epidemic behavior of Internet active worms in very large networks. A reactive control system operates on top of the platform and provides a monitor/analyze/respond approach to deal with infections automatically. Details of our highly configurable platform and various experimental performance results are presented.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An integrated architecture for dependable embedded systems 可靠嵌入式系统的集成体系结构
H. Kopetz
{"title":"An integrated architecture for dependable embedded systems","authors":"H. Kopetz","doi":"10.1109/RELDIS.2004.1353016","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353016","url":null,"abstract":"Summary form only given. A federated architecture is characterized in that every major function of an embedded system is allocated to a dedicated hardware unit. In a distributed control system this implies that adding a new function is tantamount to adding a new node. This has led to a order to achieve some functional coordination. Adding fault-tolerance to a federated architecture, e.g., by the provision of triple modular redundancy (TMR) leads to a further significant increase in the number of nodes and networks. The major advantages of a dedicated architecture are the physical encapsulation of the nearly autonomous subsystems, their outstanding fault containment and their clear-cut complexity management (independent development) in case the subsystems are nearly autonomous. An integrated distributed architecture for mixed-criticality applications must be based on a core design that supports the safety requirements of the highest considered criticality class. This is of particular importance in safety-critical applications, where the physical structure of the integrated system is determined to a significant extent by the independence requirement of fault-containment regions. The central part of an integrated distributed architecture for time-critical systems must provide the following core services: deterministic and timely transport of messages; fault tolerant clock synchronization; strong fault isolation with respect to arbitrary node failures; and consistent diagnosis of failing nodes. Any architecture that provides these core services can be used as a base architecture for an integrated distributed embedded system architecture. An example of such an integrated architecture is the time-triggered architecture (TTA). In this contribution we describe the structure and the services of the TTA that has been developed during the last twenty years and is deployed in a number of safety-critical applications in the transport sector.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability 一个有效的检查点协议,用于最小化操作回滚依赖可跟踪性的特征
Islene C. Garcia, L. E. Buzato
{"title":"An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability","authors":"Islene C. Garcia, L. E. Buzato","doi":"10.1109/RELDIS.2004.1353013","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353013","url":null,"abstract":"A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124396584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems 使用程序分析来识别和补偿容错复制系统中的不确定性
Joseph G. Slember, P. Narasimhan
{"title":"Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems","authors":"Joseph G. Slember, P. Narasimhan","doi":"10.1109/RELDIS.2004.1353026","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353026","url":null,"abstract":"Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Low latency probabilistic broadcast in wide area networks 广域网中的低延迟概率广播
J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira
{"title":"Low latency probabilistic broadcast in wide area networks","authors":"J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira","doi":"10.1109/RELDIS.2004.1353030","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353030","url":null,"abstract":"In this paper we propose a novel probabilistic broadcast protocol that reduces the average end-to-end latency by dynamically adapting to network topology and traffic conditions. It does so by using an unique strategy that consists in adjusting the fanout and preferred targets for different gossip rounds as a function of the properties of each node. Node classification is light-weight and integrated in the protocol membership management. Furthermore, each node is not required to have full knowledge of the group membership or of the network topology. The paper shows how the protocol can be configured and evaluates its performance with a detailed simulation model.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Model-based validation of an intrusion-tolerant information system 基于模型的入侵容忍信息系统验证
F. Stevens, T. Courtney, Sankalp Singh, A. Agbaria, J. F. Meyer, W. Sanders, P. Pal
{"title":"Model-based validation of an intrusion-tolerant information system","authors":"F. Stevens, T. Courtney, Sankalp Singh, A. Agbaria, J. F. Meyer, W. Sanders, P. Pal","doi":"10.1109/RELDIS.2004.1353019","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353019","url":null,"abstract":"An increasing number of computer systems are designed to be distributed across both local and wide-area networks, performing a multitude of critical information-sharing and computational tasks. Malicious attacks on such systems are a growing concern, where attackers typically seek to degrade quality of service by intrusions that exploit vulnerabilities in networks, operating systems, and application software. Accordingly, designers are seeking improved techniques for validating such systems with respect to specified survivability requirements. In this regard, we describe a model-based validation effort that was undertaken as part of a unified approach to validating a networked intrusion-tolerant information system. Model-based results were used to guide the system's design as well as to determine whether a given survivability requirement was satisfied.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126666364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Nested objects in a Byzantine quorum-replicated system 拜占庭仲裁复制系统中的嵌套对象
Charles P. Fry, M. Reiter
{"title":"Nested objects in a Byzantine quorum-replicated system","authors":"Charles P. Fry, M. Reiter","doi":"10.1109/RELDIS.2004.1353005","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353005","url":null,"abstract":"Modern distributed, object-based systems support nested method invocations, whereby one object can invoke methods on another. In this paper we present a framework that supports nested method invocations among Byzantine fault-tolerant, replicated objects that are accessed via quorum systems. A challenge in this context is that client object replicas can induce unwanted method invocations on server object replicas, due either to redundant invocations by client replicas or Byzantine failures within the client replicas. At the core of our framework are a new quorum-based authorization technique and a novel method invocation protocol that ensure the linearizability and failure atomicity of nested method invocations despite Byzantine client and server replica failures. We detail the implementation of these techniques in a system called Fleet, and give preliminary performance results for them.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Performance comparison of a rotating coordinator and a leader based consensus algorithm 旋转协调器与基于leader的共识算法的性能比较
P. Urbán, Naohiro Hayashibara, A. Schiper, T. Katayama
{"title":"Performance comparison of a rotating coordinator and a leader based consensus algorithm","authors":"P. Urbán, Naohiro Hayashibara, A. Schiper, T. Katayama","doi":"10.1109/RELDIS.2004.1352999","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1352999","url":null,"abstract":"Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we compare two well-known asynchronous consensus algorithms. In both algorithms, a leader process tries to impose a decision, and another leader retries if the leader fails doing so. The algorithms elect leaders differently: the Chandra-Toueg algorithm has a rotating leader, whereas processes in the Paxos algorithm elect leaders directly. We investigate the performance implications of this difference. In the system under study, processes send atomic broadcasts to each other. Consensus is used to decide the delivery order of messages. We evaluate the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as the transient latency after (4) one crash and (5) multiple correlated crashes. The results show that the Paxos algorithm tolerates frequent wrong suspicions (3) and correlated crashes (5) better, while the performance is comparable in all other scenarios.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信