Qiang Fu, Jian-Guang Lou, Qingwei Lin, Rui Ding, D. Zhang, Zihao Ye, Tao Xie
{"title":"Performance Issue Diagnosis for Online Service Systems","authors":"Qiang Fu, Jian-Guang Lou, Qingwei Lin, Rui Ding, D. Zhang, Zihao Ye, Tao Xie","doi":"10.1109/SRDS.2012.49","DOIUrl":"https://doi.org/10.1109/SRDS.2012.49","url":null,"abstract":"Monitoring and diagnosing performance issues of an online service system are critical to assure satisfactory performance of the system. Given a detected performance issue and collected system metrics for an online service system, engineers usually need to make great efforts to conduct diagnosis by first identifying performance issue beacons, which are metrics that pinpoint to the root causes. In order to reduce the manual efforts, in this paper, we propose a new approach to effectively detecting performance issue beacons to help with performance issue diagnosis. Our approach includes techniques for mining system metric data to address limitations when applying previous classification-based approaches. Our evaluations on both a controlled environment and a real production environment show that our approach can more effectively identify performance issue beacons from system metric data than previous approaches.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"43 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116792710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Azarmi, B. Bhargava, Pelin Angin, R. Ranchal, Norman Ahmed, A. Sinclair, M. Linderman, L. B. Othmane
{"title":"An End-to-End Security Auditing Approach for Service Oriented Architectures","authors":"M. Azarmi, B. Bhargava, Pelin Angin, R. Ranchal, Norman Ahmed, A. Sinclair, M. Linderman, L. B. Othmane","doi":"10.1109/SRDS.2012.5","DOIUrl":"https://doi.org/10.1109/SRDS.2012.5","url":null,"abstract":"Service-Oriented Architecture (SOA) is becoming a major paradigm for distributed application development in the recent explosion of Internet services and cloud computing. However, SOA introduces new security challenges not present in the single-hop client-server architectures due to the involvement of multiple service providers in a service request. The interactions of independent service domains in SOA could violate service policies or SLAs. In addition, users in SOA systems have no control on what happens in the chain of service invocations. Although the establishment of trust across all involved partners is required as a prerequisite to ensure secure interactions, still a new end-to-end security auditing mechanism is needed to verify the actual service invocations and its conformance to the expected service orchestration. In this paper, we provide an efficient solution for end-to-end security auditing in SOA. The proposed security architecture introduces two new components called taint analysis and trust broker in addition to taking advantages of WS-Security and WS-Trust standards. The interaction of these components maintains session auditing and dynamic trust among services. This solution is transparent to the services, which allows auditing of legacy services without modification. Moreover, we have implemented a prototype of the proposed approach and verified its effectiveness in a LAN setting and the Amazon EC2 cloud computing infrastructure.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131174805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Providing Uniform Reliable Broadcast Delivery for Mobile Ad Hoc Networks with MANET Liveness Property","authors":"J. Brzeziński, M. Kalewski, Jacek Kobusinski","doi":"10.1109/SRDS.2012.53","DOIUrl":"https://doi.org/10.1109/SRDS.2012.53","url":null,"abstract":"The MANET liveness property ensures that no operative host in an ad hoc network is permanently isolated, and for networks that fulfill the property a few crash-tolerant broadcast protocols have been proposed. However, the protocols proposed till now guarantee that only at least an arbitrary majority of operative hosts receives each disseminated message, and one of these protocols has been further modified to fulfill the properties of regular reliable broadcast. Moreover, it has also been proved that the minimum time of direct connectivity between hosts, and thus the correctness of all these protocols, depends on the total number of hosts in a network and on the total number of messages that can be disseminated by each host concurrently. In this paper, we propose a novel uniform reliable broadcast protocol that works correctly, even though the minimum time of a direct connection between hosts allows them to exchange at least only two messages, which makes the correctness of this protocol independent of the total number of messages that can be disseminated by all nodes in a network.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131211040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault Localization in MANET-Hosted Service-Based Systems","authors":"P. Novotný, A. Wolf, B. J. Ko","doi":"10.1109/SRDS.2012.30","DOIUrl":"https://doi.org/10.1109/SRDS.2012.30","url":null,"abstract":"Fault localization in general refers to a technique for identifying the likely root causes of failures observed in systems formed from components. Fault localization in systems deployed on mobile ad hoc networks (MANETs) is a particularly challenging task because those systems are subject to a wider variety and higher incidence of faults than those deployed in fixed networks, the resources available to track fault symptoms are severely limited, and many of the sources of faults in MANETs are by their nature transient. We present a method for localizing the faults occurring in service-based systems hosted on MANETs. The method is based on the use of dependence data that are discovered dynamically through decentralized observations of service interactions. We employ both Bayesian and timing-based reasoning techniques to analyze the data in the context of a specific fault propagation model, deriving a ranked list of candidate fault locations. We present the results of an extensive set of experiments exploring a wide range of operational conditions to evaluate the accuracy of our method.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregating CVSS Base Scores for Semantics-Rich Network Security Metrics","authors":"Pengsu Cheng, Lingyu Wang, S. Jajodia, A. Singhal","doi":"10.1109/SRDS.2012.4","DOIUrl":"https://doi.org/10.1109/SRDS.2012.4","url":null,"abstract":"A network security metric is desirable in evaluating the effectiveness of security solutions in distributed systems. Aggregating CVSS scores of individual vulnerabilities provides a practical approach to network security metric. However, existing approaches to aggregating CVSS scores usually cause useful semantics of individual scores to be lost in the aggregated result. In this paper, we address this issue through two novel approaches. First, instead of taking each base score as an input, our approach drills down to the underlying base metric level where dependency relationships have well-defined semantics. Second, our approach interprets and aggregates the base metrics from three different aspects in order to preserve corresponding semantics of the individual scores. Finally, we confirm the advantages of our approaches through simulation.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125936564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gambs, R. Guerraoui, Hamza Harkous, Florian Huc, Anne-Marie Kermarrec
{"title":"Scalable and Secure Polling in Dynamic Distributed Networks","authors":"S. Gambs, R. Guerraoui, Hamza Harkous, Florian Huc, Anne-Marie Kermarrec","doi":"10.1109/SRDS.2012.63","DOIUrl":"https://doi.org/10.1109/SRDS.2012.63","url":null,"abstract":"We consider the problem of securely conducting a poll in synchronous dynamic networks equipped with a Public Key Infrastructure (PKI). Whereas previous distributed solutions had a communication cost of O(n2) in an n nodes system, we present SPP (Secure and Private Polling), the first distributed polling protocol requiring only a communication complexity of O(n log3 n), which we prove is near-optimal. Our protocol ensures perfect security against a computationally-bounded adversary, tolerates (1/2 - ϵ)n Byzantine nodes for any constant 1/2 >; ϵ >; 0 (not depending on n), and outputs the exact value of the poll with high probability. SPP is composed of two sub-protocols, which we believe to be interesting on their own: SPP-Overlay maintains a structured overlay when nodes leave or join the network, and SPP-Computation conducts the actual poll. We validate the practicality of our approach through experimental evaluations and describe briefly two possible applications of SPP: (1) an optimal Byzantine Agreement protocol whose communication complexity is Θ(n log n) and (2) a protocol solving an open question of King and Saia in the context of aggregation functions, namely on the feasibility of performing multiparty secure aggregations with a communication complexity of o(n2).","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121726085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximation Techniques for Maintaining Real-Time Deployments Informed by User-Provided Dataflows within a Cloud","authors":"James R. Edmondson, A. Gokhale, D. Schmidt","doi":"10.1109/SRDS.2012.7","DOIUrl":"https://doi.org/10.1109/SRDS.2012.7","url":null,"abstract":"Distributed applications are increasingly developed by composing many participants, such as services, components, and objects. When deploying distributed applications into a mobile ad hoc cloud, the locality of application participants that communicate with each other can affect latency, power/-battery usage, throughput, and whether or not a cloud provider can meet service-level agreements (SLA). Optimization of important communication links within a distributed application is particularly important when dealing with mission-critical applications deployed in a distributed real-time and embedded (DRE) scenario, where violation of SLAs may result in loss of property, cyber infrastructure, or lives. To complicate the optimization process, the underlying cloud environment can change during operation and an optimal deployment of the distributed application may degrade over time due to hardware failures, overloaded hosts, and other issues that are beyond the control of distributed application developers. To optimize performance of distributed applications in dynamic environments, therefore, the deployment of participants may need adapting and revising according to the requirements of application developers and the resources available in the underlying cloud environment. This paper present two contributions to the study of dynamic optimizations of user-provided deployments within a cloud. First, we present a dataflow description language that allows developers to designate key communication paths between participants within their distributed applications. Second, we describe heuristics that use this dataflow representation to identify optimal configurations for initial deployments and/or subsequent redeployments within a cloud. An experiment is presented to validate the heuristic approaches.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"28 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query Plan Execution in a Heterogeneous Stream Management System for Situational Awareness","authors":"I. Ray, S. Madria, M. Linderman","doi":"10.1109/SRDS.2012.54","DOIUrl":"https://doi.org/10.1109/SRDS.2012.54","url":null,"abstract":"Battlefield monitoring involves collecting streaming data from different sources, transmitting the data over a heterogeneous network, and processing queries in real-time in order to respond to events in a timely manner. Nodes in these networks differ with respect to their processing, storage and communication capabilities. Links in the network differ with respect to their communication bandwidth. The topology of the network itself is subject to change, as the nodes and links may become unavailable. Continuous queries executed in such environments must also meet some quality of service (QoS) requirements, such as, response time, throughput, and memory usage. We propose that the processing of the queries be shared to improve resource utilization, such as storage and bandwidth, which, in turn, will impact the QoS. We show how multiple queries can be represented in the form of an operator tree, such that their commonalities can be easily exploited for multi query plan generation. Query plans may have to be updated in this dynamic environment (network topology changes, arrival of new queries, arrival pattern of streams altered), this, in turn, necessitates migrating operators from one set of nodes to another. We sketch some ideas about how operator migration can be done efficiently in such environments.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114708256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GRADE: Graceful Degradation in Byzantine Quorum Systems","authors":"Jingqiang Lin, Bo Luo, Jiwu Jing, Xiaokun Zhang","doi":"10.1109/SRDS.2012.34","DOIUrl":"https://doi.org/10.1109/SRDS.2012.34","url":null,"abstract":"Distributed storage systems are expected to provide correct services in the presence of Byzantine failures, which do not have any assumptions about the behavior of faulty servers and clients. In designing such systems, we often encounter the paradox of fault tolerance vs. performance (or efficiency), because better fault tolerance usually requires a tradeoff of system performance. In this paper, we present GRADE, a Byzantine-fault-tolerant (BFT) distributed storage system that enables graceful degradation. Two Byzantine quorum systems (BQSs) are supported on each GRADE server: a masking BQS storing generic data and a dissemination BQS storing self-verifying ones. Based on the system status and the environment, servers dynamically and seamlessly switch between two BQSs, without converting the stored data. Therefore, GRADE provides high performance in a normal running-state, and degrades performance to maintain high fault tolerance in emergency situations. The computation and communication costs of the running-state switch are very low, and the switch is completely transparent to clients. Our performance analysis and experimental results demonstrate that GRADE provides a balance between performance and fault tolerance.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133958006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Germanus, R. Langenberg, Abdelmajid Khelil, N. Suri
{"title":"Susceptibility Analysis of Structured P2P Systems to Localized Eclipse Attacks","authors":"Daniel Germanus, R. Langenberg, Abdelmajid Khelil, N. Suri","doi":"10.1109/SRDS.2012.70","DOIUrl":"https://doi.org/10.1109/SRDS.2012.70","url":null,"abstract":"Peer-to-Peer (P2P) protocols are susceptible to Localized Eclipse Attacks (LEA), i.e., attacks where a victim peer's environment is masked by malicious peers which are then able to instigate progressively insidious security attacks. To obtain effective placement of malicious peers, LEAs significantly benefit from overlay topology-awareness. Hence, we propose heuristics for Chord, Pastry and Kademlia to assess the protocols' LEA susceptibility based on their topology characteristics and overlay routing mechanisms. As a result, our method can be used for P2P protocol parameter tuning in order to substantially mitigate LEAs. We present evaluations highlighting LEA's impact on contemporary P2P protocols. Our proposed heuristics are abstract in nature, making them applicable plus customizable for many other structured P2P protocols. We validate our model's accuracy through a simulation case study.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121978311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}