Timothy Tsai, S. Hari, Michael B. Sullivan, Oreste Villa, S. Keckler
{"title":"NVBitFI: Dynamic Fault Injection for GPUs","authors":"Timothy Tsai, S. Hari, Michael B. Sullivan, Oreste Villa, S. Keckler","doi":"10.1109/DSN48987.2021.00041","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00041","url":null,"abstract":"GPUs have found wide acceptance in domains such as high-performance computing and autonomous vehicles, which require fast processing of large amounts of data along with provisions for reliability, availability, and safety. A key component of these dependability characteristics is the propagation of errors and their eventual effect on system outputs. In addition to analytical and simulation models, fault injection is an important technique that can evaluate the effect of errors on a complete computing system running the full software stack. However, the complexity of modern GPU systems and workloads challenges existing fault injection tools. Some tools require the recompilation of source code that may not be available, struggle to handle dynamic libraries, lack support for modern GPUs, or add unacceptable performance overheads. We introduce the NVBitFI tool for fault injection into GPU programs. In contrast with existing tools, NVBitFI performs instrumentation of code dynamically and selectively to instrument the minimal set of target dynamic kernels; as it requires no access to source code, NVBitFI provides improvements in performance and usability. The NVBitFI tool is publicly available for download and use at https://github.com/NVlabs/nvbitfi.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126663744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Study of Bugs in Software Defined Networks","authors":"Ayush Bhardwaj, Zhenyu Zhou, Theophilus A. Benson","doi":"10.1109/DSN48987.2021.00026","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00026","url":null,"abstract":"Software-defined networking (SDN) enables innovative and impressive solutions in the networking domain by decoupling the control plane from the data plane. In an SDN environment, the network control logic for load balancing, routing, and access control is written in software running on a decoupled control plane. As with any software development cycle, the SDN control plane is prone to bugs that impact the network’s performance and availability. Yet, as a community, we lack holistic, in-depth studies of bugs within the SDN ecosystem. A bug taxonomy is one of the most promising ways to lay the foundations required for (1) evaluating and directing emerging research directions on fault detection and recovery, and (2) informing operational practices of network administrators. This paper takes the first step towards laying this foundation by providing a comprehensive study and analysis of over 500 ‘critical’ bugs (including $sim 150$ with manual analysis) in three of the most widely-used SDN controllers, i.e., FAUCET, ONOS, and CORD. We create a taxonomy of these SDN bugs, analyze their operational impact, and implications for the developers. We use our taxonomy to analyze the effectiveness and coverage of several prominent SDN fault tolerance and diagnosis techniques. This study is the first of its kind in scale and coverage to the best of our knowledge.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123384720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conservative Confidence Bounds in Safety, from Generalised Claims of Improvement & Statistical Evidence","authors":"K. Salako, L. Strigini, Xingyu Zhao","doi":"10.1109/DSN48987.2021.00055","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00055","url":null,"abstract":"“Proven-in-use”, “globally-at-least-equivalent”, “stress-tested”, are concepts that come up in diverse contexts in acceptance, certification or licensing of critical systems. Their common feature is that dependability claims for a system in a certain operational environment are supported, in part, by evidence – viz of successful operation – concerning different, though related, system[s] and/or environment[s], together with an auxiliary argument that the target system/environment offers the same, or improved, safety. We propose a formal probabilistic (Bayesian) organisation for these arguments. Through specific examples of evidence for the “improvement” argument above, we demonstrate scenarios in which formalising such arguments substantially increases confidence in the target system, and show why this is not always the case. Example scenarios concern vehicles and nuclear plants. Besides supporting stronger claims, the mathematical formalisation imposes precise statements of the bases for “improvement” claims: seemingly similar forms of prior beliefs are sometimes revealed to imply substantial differences in the claims they can support.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121134422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast IPv6 Network Periphery Discovery and Security Implications","authors":"Xiang Li, Baojun Liu, Xiaofeng Zheng, Haixin Duan, Qi Li, Youjun Huang","doi":"10.1109/DSN48987.2021.00025","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00025","url":null,"abstract":"Numerous measurement researches have been performed to discover the IPv4 network security issues by leveraging the fast Internet-wide scanning techniques. However, IPv6 brings the 128-bit address space and renders brute-force network scanning impractical. Although significant efforts have been dedicated to enumerating active IPv6 hosts, limited by technique efficiency and probing accuracy, large-scale empirical measurement studies under the increasing IPv6 networks are infeasible now. To fill this research gap, by leveraging the extensively adopted IPv6 address allocation strategy, we propose a novel IPv6 network periphery discovery approach. Specifically, XMap, a fast network scanner, is developed to find the periphery, such as a home router. We evaluate it on twelve prominent Internet service providers and harvest 52M active peripheries. Grounded on these found devices, we explore IPv6 network risks of the unintended exposed security services and the flawed traffic routing strategies. First, we demonstrate the unintended exposed security services in IPv6 networks, such as DNS, and HTTP, have become emerging security risks by analyzing 4.7M peripheries. Second, by inspecting the periphery’s packet routing strategies, we present the flawed implementations of IPv6 routing protocol affecting 5.8M router devices. Attackers can exploit this common vulnerability to conduct effective routing loop attacks, inducing DoS to the ISP’s and home routers with an amplification factor of $gt {200}$. We responsibly disclose those issues to all involved vendors and ASes and discuss mitigation solutions. Our research results indicate that the security community should revisit IPv6 network strategies immediately.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114619339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Koustubha Bhat, E. V. D. Kouwe, H. Bos, Cristiano Giuffrida
{"title":"FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection","authors":"Koustubha Bhat, E. V. D. Kouwe, H. Bos, Cristiano Giuffrida","doi":"10.1109/DSN48987.2021.00048","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00048","url":null,"abstract":"Despite advances in software testing, many bugs still plague deployed software, leading to crashes and thus service disruption in high-availability production applications. Existing crash recovery solutions are either limited to transient faults or require manual annotations to target predetermined persistent bugs. Moreover, existing solutions are generally inefficient, hindering practical deployment.In this paper, we present FIRestarter (Fault Injection-based Restarter), an efficient and automatic crash recovery solution for commodity user applications. To eliminate the need for manual annotations, FIRestarter injects targeted software faults at the library interface to automatically trigger error handling code for standard library calls already part of the application. In particular, when a crash occurs, we roll back the application state before the last recoverable library call, inject a fault, and restart execution forcing the call to immediately return a predetermined error code. This strategy allows the application to automatically bypass the crashing code upon such a restart and exploits existing error-handling code to recover from even persistent bugs. Moreover, since library calls lie pervasively throughout the code, our design provides a large recovery surface despite the automated approach. Finally, FIRestarter’s recovery windows are small and frequent compared to traditional checkpoint-restart, which enables new optimizations such as the ability to support rollback by means of hybrid hardware/software transactional memory instrumentation and improve performance. We apply FIRestarter to a number of event-driven server applications and show our solution achieves near-instantaneous, state-preserving crash recovery in the face of even persistent crashes. On popular web servers, our evaluation results show a recovery surface of at least 77%, with low performance overhead of at most 17%.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130289511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joseph Connelly, Taylor Roberts, Xing Gao, Jidong Xiao, Haining Wang, A. Stavrou
{"title":"CloudSkulk: A Nested Virtual Machine Based Rootkit and Its Detection","authors":"Joseph Connelly, Taylor Roberts, Xing Gao, Jidong Xiao, Haining Wang, A. Stavrou","doi":"10.1109/DSN48987.2021.00047","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00047","url":null,"abstract":"When attackers compromise a computer system and obtain root control over the victim system, retaining that control and avoiding detection become their top priority. To achieve this goal, various rootkits have been developed. However, existing rootkits are still easy to detect as long as defenders can gain control at a lower level, such as the operating system level, the hypervisor level, or the hardware level. In this paper, we present a new type of rootkit called CloudSkulk, which is a nested virtual machine (VM) based rootkit. While nested virtualization has attracted sufficient attention from the security and cloud community, to the best of our knowledge, we are the first to reveal and demonstrate how nested virtualization can be used by attackers to develop rootkits. We then, from defenders’ perspective, present a novel approach to detecting CloudSkulk rootkits at the host level. Our experimental results show that the proposed approach is effective in detecting CloudSkulk rootkits.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129626603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The IFIP 10.4 WG Announces the Jean-Claude Laprie Award Winners for 2021","authors":"","doi":"10.1109/dsn48987.2021.00015","DOIUrl":"https://doi.org/10.1109/dsn48987.2021.00015","url":null,"abstract":"","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124044309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongkyeong Kwon, Seunghoon Woo, Gangmo Seong, Heejo Lee
{"title":"OCTOPOCS: Automatic Verification of Propagated Vulnerable Code Using Reformed Proofs of Concept","authors":"Seongkyeong Kwon, Seunghoon Woo, Gangmo Seong, Heejo Lee","doi":"10.1109/DSN48987.2021.00032","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00032","url":null,"abstract":"Addressing vulnerability propagation has become a major issue in software ecosystems. Existing approaches hold the promise of detecting widespread vulnerabilities but cannot be applied to verify effectively whether propagated vulnerable code still poses threats. We present OCTOPOCS, which uses a reformed Proof-of-Concept (PoC), to verify whether a vulnerability is propagated. Using context-aware taint analysis, OCTOPOCS extracts crash primitives (the parts used in the shared code area between the original vulnerable software and propagated software) from the original PoC. OCTOPOCS then utilizes directed symbolic execution to generate guiding inputs that direct the execution of the propagated software from the entry point to the shared code area. Thereafter, OCTOPOCS creates a new PoC by combining crash primitives and guiding inputs. It finally verifies the propagated vulnerability using the created PoC. We evaluated OCTOPOCS with 15 real-world C and C++ vulnerable software pairs, with results showing that OCTOPOCS successfully verified 14 propagated vulnerabilities.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131281616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahid Khan, Matthias Volk, J. Katoen, Alexis Braibant, M. Bouissou
{"title":"Model Checking the Multi-Formalism Language FIGARO","authors":"Shahid Khan, Matthias Volk, J. Katoen, Alexis Braibant, M. Bouissou","doi":"10.1109/DSN48987.2021.00056","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00056","url":null,"abstract":"This paper presents a probabilistic model-checking tool for FIGARO, a multi-formalism modelling language that includes e.g., generalised stochastic Petri nets, Boolean-logic driven Markov processes, telecommunication networks, dynamic reliability block diagrams, process diagrams, and electric circuits. FIGARO has been developed and maintained by EDF for the analysis of system dependability such as reliability, availability and maintainability. We present a probabilistic model-checking tool for FIGARO models. It combines efficient, fully automated verification algorithms with numerical analysis techniques. Whereas the existing FIGARO tools, the Monte Carlo simulator YAMS and the most-probable-sequence explorer FiGSEQ, provide respectively statistical guarantees and upper bounds for unreliability and unavailability, our tool provides hard guarantees: its results are correct up to a given numerical accuracy. The key ingredient is the tool-component FiGAROAPI that enables the state-space generation for FIGARO models thus facilitating model checking. This paper describes the details of FiGAROAPI and empirically evaluates the feasibility and merits of the proposed framework. FiGAROAPI leverages upon the state-of-the-art STORM model checker as back-end, and it can model check various types of formalism in their FIGARO representation.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124884319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"K2: Reading Quickly from Storage Across Many Datacenters","authors":"Khiem Ngo, Haonan Lu, Wyatt Lloyd","doi":"10.1109/DSN48987.2021.00034","DOIUrl":"https://doi.org/10.1109/DSN48987.2021.00034","url":null,"abstract":"The infrastructure available to large-scale and medium-scale web services now spans dozens of geographically dispersed datacenters. Deploying across many datacenters has the potential to significantly reduce end-user latency by serving users nearer their location. However, deploying across many datacenters requires the backend storage system be partially replicated. In turn, this can sacrifice the low latency benefits of many datacenters, especially when a storage system provides guarantees on what operations will observe. We present the K2 storage system that provides lower latency for large-scale and medium-scale web services using partial replication of data over many datacenters with strong guarantees: causal consistency, read-only transactions, and write-only transactions. K2 provides the best possible worst-case latency for partial replication, a single round trip to remote datacenters, and often avoids sending any requests to far away datacenters using a novel replication approach, write-only transaction algorithm, and read-only transaction algorithm.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128607759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}