2011 IEEE 30th International Symposium on Reliable Distributed Systems最新文献

筛选
英文 中文
An Approach Based on Swarm Intelligence for Event Dissemination in Dynamic Networks 基于群体智能的动态网络事件传播方法
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.23
Adam S. Banzi, A. Pozo, E. P. Duarte
{"title":"An Approach Based on Swarm Intelligence for Event Dissemination in Dynamic Networks","authors":"Adam S. Banzi, A. Pozo, E. P. Duarte","doi":"10.1109/SRDS.2011.23","DOIUrl":"https://doi.org/10.1109/SRDS.2011.23","url":null,"abstract":"Dynamic networks require adaptive strategies for information dissemination, as the topology constantly changes. This work presents an event-based bio-inspired dissemination approach that employs ants, which correspond to mobile agents, to spread information throughout the network. An event is defined as a state transition of a node or link. A node which detects an event in its neighborhood triggers the dissemination. Pheromones are used to both control the ant population and help to define the paths that the agents take. An empirical study was performed, in which the proposed strategy was compared with flooding and gossip algorithms. Results show that the proposed strategy presents a good trade-off between the time required to disseminate information and the overhead in terms of the number of messages employed.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114972374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Partition-Tolerant Distributed Publish/Subscribe Systems 分区容忍分布式发布/订阅系统
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.21
R. Kazemzadeh, H. Jacobsen
{"title":"Partition-Tolerant Distributed Publish/Subscribe Systems","authors":"R. Kazemzadeh, H. Jacobsen","doi":"10.1109/SRDS.2011.21","DOIUrl":"https://doi.org/10.1109/SRDS.2011.21","url":null,"abstract":"In this paper, we develop reliable distributed publish/subscribe algorithms that can tolerate concurrent failure of up to d broker machines or communication links. In our approach, d is a configuration parameter which determines the level of fault-tolerance of the system and reliability refers to exactly-once and per-source, in-order delivery of publications to clients with matching subscriptions. We propose protocols to address three problems in presence of broker or link failures: (i) subscription propagation, (ii) publication forwarding, and (iii) broker recovery. Finally, we study the effectiveness of our approach when the number of concurrent failures exceeds d. Through large-scale experimental evaluations with up to 500 brokers, we demonstrate that a system configured with a modest value of d = 3 is able to reliably deliver 97% of publications in presence of failure of up to 17% of its brokers.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129014314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures ELT:高效的基于日志的云计算基础设施故障排除系统
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.11
Kamal Kc, Xiaohui Gu
{"title":"ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures","authors":"Kamal Kc, Xiaohui Gu","doi":"10.1109/SRDS.2011.11","DOIUrl":"https://doi.org/10.1109/SRDS.2011.11","url":null,"abstract":"We present an Efficient Log-based Troubleshooting(ELT) system for cloud computing infrastructures. ELT adopts a novel hybrid log mining approach that combines coarse-grained and fine-grained log features to achieve both high accuracy and low overhead. Moreover, ELT can automatically extract key log messages and perform invariant checking to greatly simplify the troubleshooting task for the system administrator. We have implemented a prototype of the ELT system and conducted an extensive experimental study using real management console logs of a production cloud system and a Hadoop cluster. Our experimental results show that ELT can achieve more efficient and powerful troubleshooting support than existing schemes. More importantly, ELT can find software bugs that cannot be detected by current cloud system management practice.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115932207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Resilience-Driven Parameterisation of Ad Hoc Routing Protocols: olsrd as a Case Study 自组织路由协议的弹性驱动参数化:olsrd作为案例研究
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.19
Jesus Friginal, D. Andrés, Juan-Carlos Ruiz-Garcia, P. Gil
{"title":"Resilience-Driven Parameterisation of Ad Hoc Routing Protocols: olsrd as a Case Study","authors":"Jesus Friginal, D. Andrés, Juan-Carlos Ruiz-Garcia, P. Gil","doi":"10.1109/SRDS.2011.19","DOIUrl":"https://doi.org/10.1109/SRDS.2011.19","url":null,"abstract":"Ad hoc routing protocols are threatened by a variety of accidental and malicious faults that limit their use. Although a number of well-known strategies exist to enhance the performance and resilience of such type of protocols, their final effectiveness strongly relies on the usage of appropriate protocol configuration parameters. This paper investigates how to parameterise ad hoc routing protocols to combine high performance, with acceptable levels of resilience and low consumption of resources. The research places the spotlight on olsrd, an ad hoc proactive routing protocol able to run on real devices and deploy challenge-response authentication, packet signature and fault tolerance strategies at runtime. The reported practical experience is carried out in different ad hoc networking contexts integrating different types of devices, thus checking the influence that mobility of nodes and device resource constraints have on the parameterisation of the different protocol features considered.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132407004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Characterization of Node Uptime Distributions in the PlanetLab Test Bed PlanetLab试验台节点正常运行时间分布的表征
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.32
Hakon Verespej, J. Pasquale
{"title":"A Characterization of Node Uptime Distributions in the PlanetLab Test Bed","authors":"Hakon Verespej, J. Pasquale","doi":"10.1109/SRDS.2011.32","DOIUrl":"https://doi.org/10.1109/SRDS.2011.32","url":null,"abstract":"In this paper, we study nodes from the PlanetLab test bed to form a model of their uptime behavior. By applying clustering techniques to over a year's worth of availability data for the nodes, we identify six uptime distributions, each exhibiting unique characteristics shared by the nodes within it. The behavioral patterns exhibited by these groups, combined with the behaviors exhibited by the aggregate across the system, provide useful information for researchers designing applications that are run or tested on PlanetLab.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134400501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
OSARE: Opportunistic Speculation in Actively REplicated Transactional Systems OSARE:主动复制事务系统中的机会投机
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.16
R. Palmieri, F. Quaglia, P. Romano
{"title":"OSARE: Opportunistic Speculation in Actively REplicated Transactional Systems","authors":"R. Palmieri, F. Quaglia, P. Romano","doi":"10.1109/SRDS.2011.16","DOIUrl":"https://doi.org/10.1109/SRDS.2011.16","url":null,"abstract":"In this work we present OSARE, an active replication protocol for transactional systems that combines the usage of Optimistic Atomic Broadcast with a speculative concurrency control mechanism in order to overlap transaction processing and replica synchronization. OSARE biases the speculative serialization of transactions towards an order aligned with the optimistic message delivery order. However, due to the lock-free nature of its concurrency control algorithm, at high concurrency levels, namely when the probability of mismatches between optimistic and final deliveries is higher, OSARE explores additional alternative transaction serialization orders in a lightweight and opportunistic fashion. A simulation study we carried out in the context of Software Transactional Memory systems shows that OSARE achieves robust performance also in scenarios characterized by non-minimal likelihood of reorder between optimistic and final deliveries, providing remarkable speed-up with respect to state of the art speculative replication protocols.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124343318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Active Replication at (Almost) No Cost 主动复制(几乎)没有成本
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.12
André Martin, C. Fetzer, Andrey Brito
{"title":"Active Replication at (Almost) No Cost","authors":"André Martin, C. Fetzer, Andrey Brito","doi":"10.1109/SRDS.2011.12","DOIUrl":"https://doi.org/10.1109/SRDS.2011.12","url":null,"abstract":"MapReduce has become a popular programming paradigm in the domain of batch processing systems. Its simplicity allows applications to be highly scalable and to be easily deployed on large clusters. More recently, the MapReduce approach has been also applied to Event Stream Processing (ESP) systems. This approach, which we call StreamMapReduce, enabled many novel applications that require both scalability and low latency. Another recent trend is to move distributed applications to public clouds such as Amazon EC2 rather than running and maintaining private data centers. Most cloud providers charge their customers on an hourly basis rather than on CPU cycles consumed. However, many applications, especially those that process online data, need to limit their CPU utilization to conservative levels (often as low as $50%$) to be able to accommodate natural and sudden load variations without causing unacceptable deterioration in responsiveness. In this paper, we present a new fault tolerance approach based on active replication for StreamMapReduce systems. This approach is cost effective for cloud consumers as well as cloud providers. Cost effectiveness is achieved by fully utilizing the acquired computational resources without performance degradation and by reducing the need for additional nodes dedicated to fault tolerance.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115250571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Transaction Models for Massively Multiplayer Online Games 大型多人在线游戏的交易模型
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.13
Kaiwen Zhang, Bettina Kemme
{"title":"Transaction Models for Massively Multiplayer Online Games","authors":"Kaiwen Zhang, Bettina Kemme","doi":"10.1109/SRDS.2011.13","DOIUrl":"https://doi.org/10.1109/SRDS.2011.13","url":null,"abstract":"Massively Multiplayer Online Games are considered large distributed systems where the game state is partially replicated across the server and thousands of clients. Given the scale, game engines typically offer only relaxed consistency without well-defined guarantees. In this paper, we leverage the concept of transactions to define consistency models that are suitable for gaming environments. We define game specific levels of consistency that differ in the degree of isolation and atomicity they provide, and demonstrate the costs associated with their execution. Each action type within a game can then be assigned the appropriate consistency level, choosing the right trade-off between consistency and performance. The issue of durability and fault-tolerance of game actions is also discussed.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114014594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Dangers and Joys of Stock Trading on the Web: Failure Characterization of a Three-Tier Web Service 网上股票交易的危险与乐趣:三层网络服务的失败特征
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.27
F. Arshad, S. Bagchi
{"title":"Dangers and Joys of Stock Trading on the Web: Failure Characterization of a Three-Tier Web Service","authors":"F. Arshad, S. Bagchi","doi":"10.1109/SRDS.2011.27","DOIUrl":"https://doi.org/10.1109/SRDS.2011.27","url":null,"abstract":"Characterizing latent software faults is crucial to address dependability issues of current three-tier systems. A client should not have a misconception that a transaction succeeded, when in reality, it failed due to a silent error. We present a fault injection-based evaluation to characterize silent and non-silent software failures in a representative three-tier web service, one that mimics a day trading application widely used for benchmarking application servers. For failure characterization, we quantify distribution of silent and non-silent failures, and recommend low cost application-generic and application-specific consistency checks, which improve the reliability of the application. We inject three variants of null-call, where a callee returns null to the caller without executing business logic. Additionally, we inject three types of unchecked exceptions and analyze the reaction of our application. Our results show that 49% of error injections from null-calls result in silent failures, while 34% of unchecked exceptions result in silent failures. Our generic-consistency check can detect silent failures in null-calls with an accuracy as high as 100%. Non-silent failures with unchecked exceptions can be detected with an accuracy of 42% with our application-specific checks.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114070732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Theory of Fault Recovery for Component-Based Models 基于组件模型的故障恢复理论
2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1007/978-3-642-33536-5_31
Borzoo Bonakdarpour, M. Bozga, Gregor Gössler
{"title":"A Theory of Fault Recovery for Component-Based Models","authors":"Borzoo Bonakdarpour, M. Bozga, Gregor Gössler","doi":"10.1007/978-3-642-33536-5_31","DOIUrl":"https://doi.org/10.1007/978-3-642-33536-5_31","url":null,"abstract":"","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128524762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信