{"title":"Exploiting Node Connection Regularity for DHT Replication","authors":"Alessio Pace, Vivien Quéma, V. Schiavoni","doi":"10.1109/SRDS.2011.22","DOIUrl":"https://doi.org/10.1109/SRDS.2011.22","url":null,"abstract":"Distributed Hash-Tables (DHTs) provide an efficient way to store objects in large-scale peer-to-peer systems. To guarantee that objects are reliably stored, DHTs rely on replication. Several replication strategies have been proposed in the last years. The most efficient ones use predictions about the availability of nodes to reduce the number of object migrations that need to be performed: objects are preferably stored on highly available nodes. This paper proposes an alternative replication strategy. Rather than exploiting highly available nodes, we propose to leverage nodes that exhibit regularity in their connection pattern. Roughly speaking, the strategy consists in replicating each object on a set of nodes that is built in such a way that, with high probability, at any time, there are always at least $k$ nodes in the set that are available. We evaluate this replication strategy using traces of two real-world systems: eDonkey and Skype. The evaluation shows that our regularity-based replication strategy induces a systematically lower network usage than existing state of the art replication strategies.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123330088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CloudInsight: Shedding Light on the Cloud","authors":"A. Arefin, Guofei Jiang","doi":"10.1109/SRDS.2011.34","DOIUrl":"https://doi.org/10.1109/SRDS.2011.34","url":null,"abstract":"Cloud computing provides a revolutionary new computing paradigm for deploying enterprise applications and Internet services. Rather than operating their own data centers, today cloud users run their applications on the remote cloud infrastructures that are owned and managed by cloud providers. However, the cloud computing paradigm also introduces some new challenges in system management. Cloud users create virtual machine instances to run their specific application logic without knowing the underlying physical infrastructure. On the other side, cloud providers manage and operate their cloud infrastructures without knowing their customers' applications. Due to the decoupled ownership of applications and infrastructures, if a problem occurs, there is no visibility for either cloud users or providers to understand the whole context of the incident and solve it quickly. To this end, we propose a software solution, Cloud Insight, to provide some visibility through the middle virtualization layer for both cloud users and providers to address their problems quickly. Cloud Insight automatically tracks each VM instance's configuration status and maintains their life-cycle configuration records in a configuration management database (CMDB). When a user reports a problem, our algorithms automatically analyze CMDB to probabilistically determine the root cause and invoke a recovery process by interacting with the cloud user. Experimental results over data from Amazon EC2 online support forum and NEC Labs' research cloud infrastructures demonstrate that our approach can effectively automate the problem troubleshooting process in cloud environments.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130400036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Reduction of Atomic Broadcast to Consensus with Byzantine Faults","authors":"Zarko Milosevic, Martin Hutle, A. Schiper","doi":"10.1109/SRDS.2011.36","DOIUrl":"https://doi.org/10.1109/SRDS.2011.36","url":null,"abstract":"We investigate the reduction of atomic broadcast to consensus in systems with Byzantine faults. Among the several definitions of Byzantine consensus that differ only by their validity property, we identify those equivalent to atomic broadcast. Finally, we give the first deterministic atomic broadcast reduction with a constant time complexity with respect to consensus.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133046496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DONUT: Building Shortcuts in Large-Scale Decentralized Systems with Heterogeneous Peer Distributions","authors":"Sergey Legtchenko, Sébastien Monnet, Pierre Sens","doi":"10.1109/SRDS.2011.20","DOIUrl":"https://doi.org/10.1109/SRDS.2011.20","url":null,"abstract":"Large-scale distributed systems gather thousands of peers spread all over the world. Such systems need to offer good routing performances regardless of their size and despite high churn rates. To achieve that requirement, the system must add appropriate shortcuts to its logical graph (overlay). However, to choose efficient shortcuts, peers need to obtain information about the overlay topology. In case of heterogeneous peer distributions, retrieving such information is not straightforward. Moreover, due to churn, the topology rapidly evolves, making gathered information obsolete. State of- the-art systems either avoid the problem by enforcing peers to adopt a uniform distribution or only partially fulfill these requirements. To cope with this problem, we propose DONUT, a mechanism to build a local map that approximates the peer distribution, allowing the peer to accurately estimate graph distance to other peers with a local algorithm. The evaluation performed with real latency and churn traces shows that our map increases the routing process efficiency by at least 20% compared to the state-of-the-art techniques. It points out that each map is lightweight and can be efficiently propagated through the network by consuming less than 10 bps on each peer.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124230630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Pecchia, Aashis Sharma, Z. Kalbarczyk, Domenico Cotroneo, R. Iyer
{"title":"Identifying Compromised Users in Shared Computing Infrastructures: A Data-Driven Bayesian Network Approach","authors":"A. Pecchia, Aashis Sharma, Z. Kalbarczyk, Domenico Cotroneo, R. Iyer","doi":"10.1109/SRDS.2011.24","DOIUrl":"https://doi.org/10.1109/SRDS.2011.24","url":null,"abstract":"The growing demand for processing and storage capabilities has led to the deployment of high-performance computing infrastructures. Users log into the computing infrastructure remotely, by providing their credentials (e.g., username and password), through the public network and using well-established authentication protocols, e.g., SSH. However, user credentials can be stolen and an attacker (using a stolen credential) can masquerade as the legitimate user and penetrate the system as an insider. This paper deals with security incidents initiated by using stolen credentials and occurred during the last three years at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. We analyze the key characteristics of the security data produced by the monitoring tools during the incidents and use a Bayesian network approach to correlate (i) data provided by different security tools (e.g., IDS and Net Flows) and (ii) information related to the users' profiles to identify compromised users, i.e., the users whose credentials have been stolen. The technique is validated with the real incident data. The experimental results demonstrate that the proposed approach is effective in detecting compromised users, while allows eliminating around 80% of false positives (i.e., not compromised user being declared compromised).","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131677405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Vitenberg, D. Zinenko, Kristian Kvilekval, Ambuj K. Singh
{"title":"Analyzing Performance of Lease-Based Schemes under Failures","authors":"R. Vitenberg, D. Zinenko, Kristian Kvilekval, Ambuj K. Singh","doi":"10.1109/SRDS.2011.31","DOIUrl":"https://doi.org/10.1109/SRDS.2011.31","url":null,"abstract":"Leases have proved to be an effective concurrency control technique for distributed systems that are prone to failures. However, many benefits of leases are only realized when leases are granted for approximately the time of expected use. Correct assessment of lease duration has proven difficult for all but the simplest of resource allocation problems. In this paper, we present a model that captures a number of lease styles and semantics used in practice. We consider a few performance characteristics for lease-based systems and analytically derive how they are affected by lease duration. We confirm our analytical findings by running a set of experiments with the OO7 benchmark suite using a variety of workloads and fault loads.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling of Dynamic Participants in Real-Time Distributed Systems","authors":"M. Sin, Mélanie Bouroche, V. Cahill","doi":"10.1109/SRDS.2011.37","DOIUrl":"https://doi.org/10.1109/SRDS.2011.37","url":null,"abstract":"Access to shared resources can be controlled by schedules or mutual exclusion. Such methods are not practical in an environment with dynamic participants, where nodes requiring access to shared resources can enter or leave the scene at any time. Current scheduling methods are usually centralized, demand that the system has a clear idea of when the resources are required and do not consider communication failures. Current implementations of distributed mutual exclusion use token- or permission-based methods. Dynamic participation amplifies the lost token problem in token-based approaches, while limited knowledge of the number of nodes makes obtaining quora and consensus in permission-based approaches impossible, rendering both mutual exclusion implementations impractical. This paper presents the CwoRIS protocol which enables short-term scheduling in real-time within an environment with dynamic participants. It motivates the need to support dynamic participants by means of a scenario for autonomous vehicle coordination in intersection crossing. The paper shows that the protocol is able to work in an environment with message loss and argues its correctness by showing mutual exclusion: there are no cases in which two nodes have access to the same resources at the same time.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122705954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Latent Features for Memory-Based QoS Prediction in Cloud Computing","authors":"Yilei Zhang, Zibin Zheng, Michael R. Lyu","doi":"10.1109/SRDS.2011.10","DOIUrl":"https://doi.org/10.1109/SRDS.2011.10","url":null,"abstract":"With the increasing popularity of cloud computing as a solution for building high-quality applications on distributed components, efficiently evaluating user-side quality of cloud components becomes an urgent and crucial research problem. However, invoking all the available cloud components from user-side for evaluation purpose is expensive and impractical. To address this critical challenge, we propose a neighborhood-based approach, called CloudPred, for collaborative and personalized quality prediction of cloud components. CloudPred is enhanced by feature modeling on both users and components. Our approach CloudPred requires no additional invocation of cloud components on behalf of the cloud application designers. The extensive experimental results show that CloudPred achieves higher QoS prediction accuracy than other competing methods. We also publicly release our large-scale QoS dataset for future related research in cloud computing.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"81 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123430553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Machida, E. Andrade, Dong Seong Kim, Kishor S. Trivedi
{"title":"Candy: Component-based Availability Modeling Framework for Cloud Service Management Using SysML","authors":"F. Machida, E. Andrade, Dong Seong Kim, Kishor S. Trivedi","doi":"10.1109/SRDS.2011.33","DOIUrl":"https://doi.org/10.1109/SRDS.2011.33","url":null,"abstract":"High-availability assurance of cloud service is a critical and challenging issue for cloud service providers. To quantify the availability of cloud services from both architectural and operational points of views, availability modeling and evaluation are essential. This paper presents a component-based availability modeling framework, named Candy, which constructs a comprehensive availability model semi-automatically from system specifications described by Systems Modeling Language (SysML). SysML diagrams are translated into components of availability model and the components are assembled together to form the entire availability model in Stochastic Reward Nets (SRNs). In order to incorporate the maintenance operations of cloud services in availability models, Candy defines the translation rules from Activity diagram to SRN and synchronizes the related SRNs according to SysML allocation notations. The feasibility of the proposed modeling and availability evaluation process is studied by an illustrative example of a web application service hosted on a cloud infrastructure having multiple failure isolation zones and automatic scale-up function.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126829072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DiveInto: Supporting Diversity in Intrusion-Tolerant Systems","authors":"João Antunes, N. Neves","doi":"10.1109/SRDS.2011.25","DOIUrl":"https://doi.org/10.1109/SRDS.2011.25","url":null,"abstract":"Intrusion tolerant services are usually implemented as replicated systems. If replicas execute identical software, then they share the same vulnerabilities and the whole system can be easily compromised if a single flaw is found. One solution to this problem is to introduce diversity by using different server implementations, but this increases the chances of incompatibility between replicas. This paper studies various kinds incompatibilities and presents a new methodology to evaluate the compliance of diverse server replicas. The methodology collects network traces to identify syntax and semantic violations, and to assist in their resolution. A tool called Dive Into was developed based on the methodology and was applied to three replication scenarios. The experiments demonstrate that Dive Into is capable of discovering various sorts of violations, including problems related with nondeterministic execution.","PeriodicalId":116805,"journal":{"name":"2011 IEEE 30th International Symposium on Reliable Distributed Systems","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128920584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}