Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.最新文献

筛选
英文 中文
Token-based atomic broadcast using unreliable failure detectors 使用不可靠故障检测器的基于令牌的原子广播
Richard Ekwall, A. Schiper, P. Urbán
{"title":"Token-based atomic broadcast using unreliable failure detectors","authors":"Richard Ekwall, A. Schiper, P. Urbán","doi":"10.1109/RELDIS.2004.1353003","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353003","url":null,"abstract":"Many atomic broadcast algorithms have been published in the last twenty years. Token-based algorithms represent a large class of these algorithms. Interestingly, all the token-based atomic broadcast algorithms rely on a group membership service, i.e., none of them uses unreliable failure detectors directly. The paper presents the first token-based atomic broadcast algorithm that uses an unreliable failure detector - the new failure detector denoted by /spl Rscr/ - instead of a group membership service. The failure detector /spl Rscr/ is compared with <>V and <>S. In order to make it easier to understand the atomic broadcast algorithm, the paper derives the atomic broadcast algorithm from a token-based consensus algorithm that also uses the failure detector /spl Rscr/.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115353950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Hardware support for high performance, intrusion- and fault-tolerant systems 硬件支持高性能,入侵和容错系统
G. P. Saggese, C. Basile, L. Romano, Z. Kalbarczyk, R. Iyer
{"title":"Hardware support for high performance, intrusion- and fault-tolerant systems","authors":"G. P. Saggese, C. Basile, L. Romano, Z. Kalbarczyk, R. Iyer","doi":"10.1109/RELDIS.2004.1353020","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353020","url":null,"abstract":"The paper proposes a combined hardware/software approach for realizing high performance, intrusion- and fault-tolerant services. The approach is demonstrated for (yet not limited to) an attribute authority server, which provides a compelling application due to its stringent performance and security requirements. The key element of the proposed architecture is an FPGA-based, parallel crypto-engine providing (1) optimally dimensioned RSA Processors for efficient execution of computationally intensive RSA signatures and (2) a KeyStore facility used as tamper-resistant storage for preserving secret keys. To achieve linear speed-up (with the number of RSA Processors) and deadlock-free execution in spite of resource-sharing and scheduling/synchronization issues, we have resorted to a number of performance enhancing techniques (e.g., use of different clock domains, optimal balance between internal and external parallelism) and have formally modeled and mechanically proved our crypto-engine with the Spin model checker. At the software level, the architecture combines active replication and threshold cryptography, but in contrast with previous work, the code of our replicas is multithreaded so it can efficiently use an attached parallel crypto-engine to compute an attribute authority partial signature (as required by threshold cryptography). Resulting replicated systems that exhibit nondeterministic behavior, which cannot be handled with conventional replication approaches. Our architecture is based on a preemptive deterministic scheduling algorithm to govern scheduling of replica threads and guarantee strong replica consistency.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Run-time monitoring for dependable systems: an approach and a case study 可靠系统的运行时监控:一种方法和案例研究
Sérgio Ricardo Rota, J. R. Almeida
{"title":"Run-time monitoring for dependable systems: an approach and a case study","authors":"Sérgio Ricardo Rota, J. R. Almeida","doi":"10.1109/RELDIS.2004.1353002","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353002","url":null,"abstract":"This paper describes a run-time monitoring system designed for same functionality systems installed in different places that use equivalent hardware configurations, but with slightly different implementations. These systems exhibit common characteristics. They are large software systems, they depend on hardware to execute their functions, and they are usually adjusted to meet new user needs. In this scenario it is unreasonable to assume that software testing will uncover all latent errors. Besides gathering information about a target program as it executes the run-time monitoring system proposed provides information about the target operating system and the target hardware in order to improve availability by reducing time to diagnose failures and provide a system with the reactive capability of reconfiguring and reinitializing after the occurrence of a failure. A case study for an automatic teller machine system is discussed as an application of the run-time monitoring system and the results from this application are presented.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116455799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Dependable pervasive systems 可靠的普及系统
B. Randell
{"title":"Dependable pervasive systems","authors":"B. Randell","doi":"10.1109/RELDIS.2004.1352998","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1352998","url":null,"abstract":"Summary form only given. Present trends indicate that huge networked computer systems are likely to become pervasive, as information technology is embedded into virtually everything, and to be required to function essentially continuously. I believe that even today's (underused) \"best practice\" regarding the achievement of high dependability - reliability, availability, security, safety, etc. - from large networked computer systems will not suffice for future pervasive systems. I will give my perspective on the current state of research into the four basic dependability technologies: (i) fault prevention (to avoid the occurrence or introduction of faults), (ii) fault removal (through validation and verification), (iii) fault tolerance (so that failures do not necessarily occur even if faults remain), and (iv) fault forecasting (the means of assessing progress towards achieving adequate dependability). I will then argue that much further research is required on all four dependability technologies in order to cope with pervasive systems, identify some priorities, and discuss how this research could best be aimed at making system dependability into a \"commodity\" that industry can value and from which it can profit.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127188379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Skewed checkpointing for tolerating multi-node failures 允许多节点故障的倾斜检查点
Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya
{"title":"Skewed checkpointing for tolerating multi-node failures","authors":"Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya","doi":"10.1109/RELDIS.2004.1353012","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353012","url":null,"abstract":"Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated checkpointing is a promising way to maintain high availability because the computing nodes are tightly connected to one another. However, as the number of computing nodes gets larger, the probability of multi-node failures increases. To tolerate multi-node failures, a large degree of redundancy is required in checkpointing, but this leads to performance degradation. Thus, we propose a new coordinated checkpointing called skewed checkpointing. In this method, checkpointing is skewed every time. Although each checkpointing itself contains only one degree of redundancy, this skewed checkpointing ensures /spl lfloor/log/sub 2/N/spl rfloor/ degrees of redundancy when the number of nodes is N. In this paper, we present the proposed method and an analysis of the performance overhead. Then, this method is applied to a cluster system and compared with other conventional checkpointing schemes. The results reveal the superiority of our method, especially for large cluster systems.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The design and evaluation of a defense system for Internet worms 网络蠕虫防御系统的设计与评价
R. Scandariato, J. Knight
{"title":"The design and evaluation of a defense system for Internet worms","authors":"R. Scandariato, J. Knight","doi":"10.1109/RELDIS.2004.1353017","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353017","url":null,"abstract":"Many areas of society have become heavily dependent on services such as transportation facilities, utilities and so on that are implemented in part by large numbers of computers and communications links. Both past incidents and research studies show that a well-engineered Internet worm can disable such systems in a fairly simple way and, most notably, in a matter of a few minutes. This indicates the need for defenses against worms but their speed rules out the possibility of manually countering worm outbreaks. We present a platform that emulates the epidemic behavior of Internet active worms in very large networks. A reactive control system operates on top of the platform and provides a monitor/analyze/respond approach to deal with infections automatically. Details of our highly configurable platform and various experimental performance results are presented.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An integrated architecture for dependable embedded systems 可靠嵌入式系统的集成体系结构
H. Kopetz
{"title":"An integrated architecture for dependable embedded systems","authors":"H. Kopetz","doi":"10.1109/RELDIS.2004.1353016","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353016","url":null,"abstract":"Summary form only given. A federated architecture is characterized in that every major function of an embedded system is allocated to a dedicated hardware unit. In a distributed control system this implies that adding a new function is tantamount to adding a new node. This has led to a order to achieve some functional coordination. Adding fault-tolerance to a federated architecture, e.g., by the provision of triple modular redundancy (TMR) leads to a further significant increase in the number of nodes and networks. The major advantages of a dedicated architecture are the physical encapsulation of the nearly autonomous subsystems, their outstanding fault containment and their clear-cut complexity management (independent development) in case the subsystems are nearly autonomous. An integrated distributed architecture for mixed-criticality applications must be based on a core design that supports the safety requirements of the highest considered criticality class. This is of particular importance in safety-critical applications, where the physical structure of the integrated system is determined to a significant extent by the independence requirement of fault-containment regions. The central part of an integrated distributed architecture for time-critical systems must provide the following core services: deterministic and timely transport of messages; fault tolerant clock synchronization; strong fault isolation with respect to arbitrary node failures; and consistent diagnosis of failing nodes. Any architecture that provides these core services can be used as a base architecture for an integrated distributed embedded system architecture. An example of such an integrated architecture is the time-triggered architecture (TTA). In this contribution we describe the structure and the services of the TTA that has been developed during the last twenty years and is deployed in a number of safety-critical applications in the transport sector.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability 一个有效的检查点协议,用于最小化操作回滚依赖可跟踪性的特征
Islene C. Garcia, L. E. Buzato
{"title":"An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability","authors":"Islene C. Garcia, L. E. Buzato","doi":"10.1109/RELDIS.2004.1353013","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353013","url":null,"abstract":"A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124396584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems 使用程序分析来识别和补偿容错复制系统中的不确定性
Joseph G. Slember, P. Narasimhan
{"title":"Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems","authors":"Joseph G. Slember, P. Narasimhan","doi":"10.1109/RELDIS.2004.1353026","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353026","url":null,"abstract":"Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Low latency probabilistic broadcast in wide area networks 广域网中的低延迟概率广播
J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira
{"title":"Low latency probabilistic broadcast in wide area networks","authors":"J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira","doi":"10.1109/RELDIS.2004.1353030","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353030","url":null,"abstract":"In this paper we propose a novel probabilistic broadcast protocol that reduces the average end-to-end latency by dynamically adapting to network topology and traffic conditions. It does so by using an unique strategy that consists in adjusting the fanout and preferred targets for different gossip rounds as a function of the properties of each node. Node classification is light-weight and integrated in the protocol membership management. Furthermore, each node is not required to have full knowledge of the group membership or of the network topology. The paper shows how the protocol can be configured and evaluates its performance with a detailed simulation model.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信