2011 IEEE 30th International Symposium on Reliable Distributed Systems最新文献

An Approach Based on Swarm Intelligence for Event Dissemination in Dynamic Networks 基于群体智能的动态网络事件传播方法

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.23

Adam S. Banzi, A. Pozo, E. P. Duarte

引用次数: 2

Partition-Tolerant Distributed Publish/Subscribe Systems 分区容忍分布式发布/订阅系统

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.21

R. Kazemzadeh, H. Jacobsen

引用次数: 23

ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures ELT:高效的基于日志的云计算基础设施故障排除系统

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.11

Kamal Kc, Xiaohui Gu

引用次数: 42

Resilience-Driven Parameterisation of Ad Hoc Routing Protocols: olsrd as a Case Study 自组织路由协议的弹性驱动参数化:olsrd作为案例研究

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.19

Jesus Friginal, D. Andrés, Juan-Carlos Ruiz-Garcia, P. Gil

引用次数: 5

A Characterization of Node Uptime Distributions in the PlanetLab Test Bed PlanetLab试验台节点正常运行时间分布的表征

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.32

Hakon Verespej, J. Pasquale

引用次数: 11

OSARE: Opportunistic Speculation in Actively REplicated Transactional Systems OSARE:主动复制事务系统中的机会投机

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.16

R. Palmieri, F. Quaglia, P. Romano

引用次数: 46

Active Replication at (Almost) No Cost 主动复制(几乎)没有成本

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.12

André Martin, C. Fetzer, Andrey Brito

{"title":"Active Replication at (Almost) No Cost","authors":"André Martin, C. Fetzer, Andrey Brito","doi":"10.1109/SRDS.2011.12","DOIUrl":"https://doi.org/10.1109/SRDS.2011.12","url":null,"abstract":"MapReduce has become a popular programming paradigm in the domain of batch processing systems. Its simplicity allows applications to be highly scalable and to be easily deployed on large clusters. More recently, the MapReduce approach has been also applied to Event Stream Processing (ESP) systems. This approach, which we call StreamMapReduce, enabled many novel applications that require both scalability and low latency. Another recent trend is to move distributed applications to public clouds such as Amazon EC2 rather than running and maintaining private data centers. Most cloud providers charge their customers on an hourly basis rather than on CPU cycles consumed. However, many applications, especially those that process online data, need to limit their CPU utilization to conservative levels (often as low as $50%$) to be able to accommodate natural and sudden load variations without causing unacceptable deterioration in responsiveness. In this paper, we present a new fault tolerance approach based on active replication for StreamMapReduce systems. This approach is cost effective for cloud consumers as well as cloud providers. Cost effectiveness is achieved by fully utilizing the acquired computational resources without performance degradation and by reducing the need for additional nodes dedicated to fault tolerance.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115250571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Transaction Models for Massively Multiplayer Online Games 大型多人在线游戏的交易模型

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.13

Kaiwen Zhang, Bettina Kemme

引用次数: 18

Dangers and Joys of Stock Trading on the Web: Failure Characterization of a Three-Tier Web Service 网上股票交易的危险与乐趣:三层网络服务的失败特征

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1109/SRDS.2011.27

F. Arshad, S. Bagchi

{"title":"Dangers and Joys of Stock Trading on the Web: Failure Characterization of a Three-Tier Web Service","authors":"F. Arshad, S. Bagchi","doi":"10.1109/SRDS.2011.27","DOIUrl":"https://doi.org/10.1109/SRDS.2011.27","url":null,"abstract":"Characterizing latent software faults is crucial to address dependability issues of current three-tier systems. A client should not have a misconception that a transaction succeeded, when in reality, it failed due to a silent error. We present a fault injection-based evaluation to characterize silent and non-silent software failures in a representative three-tier web service, one that mimics a day trading application widely used for benchmarking application servers. For failure characterization, we quantify distribution of silent and non-silent failures, and recommend low cost application-generic and application-specific consistency checks, which improve the reliability of the application. We inject three variants of null-call, where a callee returns null to the caller without executing business logic. Additionally, we inject three types of unchecked exceptions and analyze the reaction of our application. Our results show that 49% of error injections from null-calls result in silent failures, while 34% of unchecked exceptions result in silent failures. Our generic-consistency check can detect silent failures in null-calls with an accuracy as high as 100%. Non-silent failures with unchecked exceptions can be detected with an accuracy of 42% with our application-specific checks.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114070732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Theory of Fault Recovery for Component-Based Models 基于组件模型的故障恢复理论

2011 IEEE 30th International Symposium on Reliable Distributed Systems Pub Date : 2011-10-04 DOI: 10.1007/978-3-642-33536-5_31

Borzoo Bonakdarpour, M. Bozga, Gregor Gössler

引用次数: 9