{"title":"Online Behavior Identification in Distributed Systems","authors":"J. '. Cid-Fuentes, Claudia Szabo, K. Falkner","doi":"10.1109/SRDS.2015.16","DOIUrl":"https://doi.org/10.1109/SRDS.2015.16","url":null,"abstract":"The diagnosis, prediction, and understanding of unexpected behavior is crucial for long running, large scale distributed systems. However, existing works focus on the identification of faults in specific time moments preceded by significantly abnormal metric readings, or require a previous analysis of historical failure data. In this work, we propose an online behavior classification system to identify a wide range of undesired behaviors, which may appear even in healthy systems, and their evolution over time. We employ a two-step process involving two online classifiers on periodically collected system metrics to identify at runtime normal and anomalous behaviors such as deadlock, starvation and livelock, without any previous analysis of historical failure data. Our approach achieves over 80% accuracy in detecting unexpected behaviors and over 90% accuracy in identifying their type with a short delay after the anomalies appear, and with minimal expert intervention. Our experimental analysis uses system execution traces obtained from a Google cluster and from our in-house distributed system with varied behaviors, and shows the benefits of our approach as well as future research challenges.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127456003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PASS: An Address Space Slicing Framework for P2P Eclipse Attack Mitigation","authors":"Daniel Germanus, Hatem Ismail, N. Suri","doi":"10.1109/SRDS.2015.14","DOIUrl":"https://doi.org/10.1109/SRDS.2015.14","url":null,"abstract":"The decentralized design of Peer-to-Peer (P2P) protocols inherently provides for fault tolerance to non-malicious faults. However, the base P2P scalability and decentralization requirements often result in design choices that negatively impact their robustness to varied security threats. A prominent vulnerability are Eclipse attacks that aim at information hiding and consequently perturb a P2P overlay's reliable service delivery. Divergent lookups constitute an advocated mitigation technique but are size-limited to overlay networks with tens of thousands of peers. In this work, building upon divergent lookups, we propose a novel and scalable P2P address space slicing strategy (PASS) to efficiently mitigate attacks in overlays that host hundreds of thousands of peers. Moreover, we integrate and evaluate diversely designed lookup variants to assess their network overhead and mitigation rates. The proposed PASS approach shows mitigation rates reaching up to 100%.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"311 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133750045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Securing Passive Replication through Verification","authors":"Bruno Vavala, N. Neves, P. Steenkiste","doi":"10.1109/SRDS.2015.38","DOIUrl":"https://doi.org/10.1109/SRDS.2015.38","url":null,"abstract":"We show how to leverage trusted computing technology to design an efficient fully-passive replicated system tolerant to arbitrary failures. The system dramatically reduces the complexity of a fault-tolerant service, in terms of protocols, messages, data processing and non-deterministic operations. Our replication protocol enables the execution of a single protected service, replicating only its state, while allowing the backup replicas to check the correctness of the results. We implemented our protocol on Trusted Computing (TC) technology and compared it with two recent replication systems.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116359579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Denial of Service Elusion (DoSE): Keeping Clients Connected for Less","authors":"Paul C. Wood, Christopher N. Gutierrez, S. Bagchi","doi":"10.1109/SRDS.2015.31","DOIUrl":"https://doi.org/10.1109/SRDS.2015.31","url":null,"abstract":"Denial of Service (DoS) attacks continue to grow in magnitude, duration, and frequency increasing the demand for techniques to protect services from disruption, especially at a low cost. We present Denial of Service Elusion (DoSE) as an inexpensive method for mitigating network layer attacks by utilizing cloud infrastructure and content delivery networks to protect services from disruption. DoSE uses these services to create a relay network between the client and the protected service that evades attack by selectively releasing IP address information. DoSE incorporates client reputation as a function of prior behavior to stop attackers along with a feedback controller to limit costs. We evaluate DoSE by modeling relays, clients, and attackers in an agent-based MATLAB simulator. The results show DoSE can mitigate a single-insider attack on 1,000 legitimate clients in 3.9 minutes while satisfying an average of 88.2% of requests during the attack.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131832974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Tang, Jianwei Yin, Wei Lo, Ying Li, Shuiguang Deng, Kexiong Dong, C. Pu
{"title":"MICS: Mingling Chained Storage Combining Replication and Erasure Coding","authors":"Yan Tang, Jianwei Yin, Wei Lo, Ying Li, Shuiguang Deng, Kexiong Dong, C. Pu","doi":"10.1109/SRDS.2015.25","DOIUrl":"https://doi.org/10.1109/SRDS.2015.25","url":null,"abstract":"High reliability, low space cost, and efficient read/write performance are all desirable properties for cloud storage systems. Due to the inherent conflicts, however, simultaneously achieving optimality on these properties is unrealistic. Since reliable storage is indispensable prerequisite for services with high availability, tradeoff should therefore be made between space and read/write efficiency when storage scheme is designed. N-way Replication and Erasure Coding, two extensively-used storage schemes with high reliability, adopt opposite strategies on this tradeoff issue. However, unbalanced tradeoff designs of both schemes confine their effectiveness to limited types of workloads and system requirements. To mitigate such applicability penalty, we propose MICS, a MIngling Chained Storage scheme that combines structural and functional advantages from both N-way replication and erasure coding. Qualitatively, MICS provides efficient read/write performance and high reliability at reasonably low space cost. MICS stores each object in two forms: a full copy and certain amount of erasure-coded segments. We establish dedicated read/write protocols for MICS leveraging the unique structural advantages. Moreover, MICS provides high read/write efficiency with Pipeline Random-Access Memory consistency to guarantee reasonable semantics for services users. Evaluation results demonstrate that under same fault tolerance and consistency level, MICS outperforms N-way replication and pure erasure coding in I/O throughput by up to 34.1% and 51.3% respectively. Furthermore, MICS shows superior performance stability over diverse workload conditions, in which case the standard deviation of MICS is 70.1% and 29.3% smaller than those of other two schemes.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130834350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Miani, B. Zarpelão, Bertrand Sobesto, M. Cukier
{"title":"A Practical Experience on Evaluating Intrusion Prevention System Event Data as Indicators of Security Issues","authors":"R. Miani, B. Zarpelão, Bertrand Sobesto, M. Cukier","doi":"10.1109/SRDS.2015.17","DOIUrl":"https://doi.org/10.1109/SRDS.2015.17","url":null,"abstract":"There are currently no generally accepted metrics for information security issues. One reason is the lack of validation using empirical data. In this practical experience report, we investigate whether metrics obtained from security devices used to monitor network traffic can be employed as indicators of security incidents. If so, security experts can use this information to better define priorities on security inspection and also to develop new rules for incident prevention. The metrics we investigate are derived from intrusion detection and prevention system (IDPS) alert events. We performed an empirical case study using IDPS data provided by a large organization of about 40,000 computers. The results indicate that characteristics of alerts can be used to depict trends in some security issues and consequently serve as indicators of security performance.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114083267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Anceaume, Yann Busnel, Nicolo Rivetti, B. Sericola
{"title":"Identifying Global Icebergs in Distributed Streams","authors":"E. Anceaume, Yann Busnel, Nicolo Rivetti, B. Sericola","doi":"10.1109/SRDS.2015.19","DOIUrl":"https://doi.org/10.1109/SRDS.2015.19","url":null,"abstract":"We consider the problem of identifying global iceberg attacks in massive and physically distributed streams. A global iceberg is a distributed denial of service attack, where some elements globally recur many times across the distributed streams, but locally, they do not appear as a deny of service. A natural solution to defend against global iceberg attacks is to rely on multiple routers that locally scan their network traffic, and regularly provide monitoring information to a server in charge of collecting and aggregating all the monitored information. Any relevant solution to this problem must minimise the communication between the routers and the coordinator, and the space required by each node to analyse its stream. We propose a distributed algorithm that tracks global icebergs on the fly with guaranteed error bounds, limited memory and processing requirements. We present a thorough analysis of our algorithm performance. In particular we derive a tight upper bound on the number of bits communicated between the multiple routers and the coordinator in presence of an oblivious adversary. Finally, we present the main results of the experiments we have run on a cluster of single-board computers. Those experiments confirm the efficiency and accuracy of our algorithm to track global icebergs hidden in very large input data streams exhibiting different shapes.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128423027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Separating the WHEAT from the Chaff: An Empirical Design for Geo-Replicated State Machines","authors":"João Sousa, A. Bessani","doi":"10.1109/SRDS.2015.40","DOIUrl":"https://doi.org/10.1109/SRDS.2015.40","url":null,"abstract":"State machine replication is a fundamental technique for implementing consistent fault-tolerant services. In the last years, several protocols have been proposed for improving the latency of this technique when the replicas are deployed in geographically-dispersed locations. In this work we evaluate some representative optimizations proposed in the literature by implementing them on an open-source state machine replication library and running the experiments in geographically-diverse PlanetLab nodes and Amazon EC2 regions. Interestingly, our results show that some optimizations widely used for improving the latency of geo-replicated state machines do not bring significant benefits, while others - not yet considered in this context - are very effective. Based on this evaluation, we propose WHEAT, a configurable crash and Byzantine fault-tolerant state machine replication library that uses the optimizations we observed as most effective in reducing SMR latency. WHEAT employs novel voting assignment schemes that, by using few additional spare replicas, enables the system to make progress without needing to access a majority of replicas. Our evaluation shows that a WHEAT system deployed in several Amazon EC2 regions presents a median latency up to 56% lower than a \"normal\" SMR protocol.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121832959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ControlFreak: Signature Chaining to Counter Control Flow Attacks","authors":"Sergei Arnautov, C. Fetzer","doi":"10.1109/SRDS.2015.35","DOIUrl":"https://doi.org/10.1109/SRDS.2015.35","url":null,"abstract":"Many modern embedded systems use networks to communicate. This increases the attack surface: the adversary does not need to have physical access to the system and can launch remote attacks. By exploiting software bugs, the attacker might be able to change the behavior of a program. Security violations in safety-critical systems are particularly dangerous since they might lead to catastrophic results. Hence, safety-critical software requires additional protection. We present an approach to detect and prevent control flow attacks. Such attacks maliciously modify program's control flow to achieve the desired behavior. We develop ControlFreak, a hardware watchdog to monitor program execution and to prevent illegal control flow transitions. The watchdog employs chained signatures to detect any modification of the instruction stream and any illegal jump in the program even if signatures are maliciously modified.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126808306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PCM: A Parity-Check Matrix Based Approach to Improve Decoding Performance of XOR-based Erasure Codes","authors":"Yongzhe Zhang, Chentao Wu, Jie Li, M. Guo","doi":"10.1109/SRDS.2015.15","DOIUrl":"https://doi.org/10.1109/SRDS.2015.15","url":null,"abstract":"In large storage systems, erasure codes is a primary technique to provide high reliability with low monetary cost. Among various erasure codes, a major category called XORbased codes uses purely XOR operations to generate redundant data and offer low computational complexity. These codes are conventionally implemented via matrix based method or several specialized non-matrix based methods. However, these approaches are insufficient on decoding performance, which affects the reliability and availability of storage systems. To address the problem, in this paper, we propose a novel Parity-Check Matrix based (PCM) approach, which is a general-purpose method to implement XOR-based codes, and increases the decoding performance by using smaller and sparser matrices. To demonstrate the effectiveness of PCM, we conduct several experiments by using different XOR-based codes. The evaluation results show that, compared to typical matrix based decoding methods, PCM can improve the decoding speed by up to a factor of 1.5× when using EVENODD code (an erasure code for correcting double disk failures), and accelerate the decoding process of STAR code (an erasure code for correcting triple disk failures) by up to a factor of 2.4×.","PeriodicalId":244925,"journal":{"name":"2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127014388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}