Jesus Friginal, Juan-Carlos Ruiz-Garcia, D. Andrés, Antonio Bustos
{"title":"Mitigating the impact of ambient noise on Wireless Mesh Networks using adaptive link-quality-based packet replication","authors":"Jesus Friginal, Juan-Carlos Ruiz-Garcia, D. Andrés, Antonio Bustos","doi":"10.1109/DSN.2012.6263918","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263918","url":null,"abstract":"Wireless Mesh networks (WMN) typically rely on proactive routing protocols to establish optimal communication routes between every pair of system nodes. These protocols integrate link-quality-based mechanisms to minimise the adverse effect of ambient noise on communications. This paper shows the limitations existing in such mechanisms by analysing the impact of ambient noise on three state-of-the-art proactive routing protocols: OLSR, B.A.T.M.A.N and Babel. As will be shown, the lack of context-awareness in their link-quality mechanisms prevents the protocols from adjusting their behaviour according to persistent levels of ambient noise, which may vary along the time. Consequently, they cannot minimise the impact of such noise on the availability of network routes. This issue is very serious for a WMN since the loss communication links may strongly increase the convergence time of the network. An adaptive extension to studied link-quality-based mechanisms is proposed to avoid the loss of communication links in the presence of high levels of ambient noise. The effectiveness of the proposal is experimentally assessed, thus establishing a new method to reduce the impact of ambient noise on WMN.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"36 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123484039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical scrubbing: Getting to the bad sector at the right time","authors":"George Amvrosiadis, Alina Oprea, Bianca Schroeder","doi":"10.1109/DSN.2012.6263919","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263919","url":null,"abstract":"Latent sector errors (LSEs) are a common hard disk failure mode, where disk sectors become inaccessible while the rest of the disk remains unaffected. To protect against LSEs, commercial storage systems use scrubbers: background processes verifying disk data. The efficiency of different scrubbing algorithms in detecting LSEs has been studied in depth; however, no attempts have been made to evaluate or mitigate the impact of scrubbing on application performance. We provide the first known evaluation of the performance impact of different scrubbing policies in implementation, including guidelines on implementing a scrubber. To lessen this impact, we present an approach giving conclusive answers to the questions: when should scrubbing requests be issued, and at what size, to minimize impact and maximize scrubbing throughput for a given workload. Our approach achieves six times more throughput, and up to three orders of magnitude less slowdown than the default Linux I/O scheduler.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114534751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding soft error propagation using Efficient vulnerability-driven fault injection","authors":"Xin Xu, Man-Lap Li","doi":"10.1109/DSN.2012.6263923","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263923","url":null,"abstract":"Extreme CMOS scaling is expected to significantly impact the reliability of future microprocessors, prompting recent research effort on low-cost hardware-software cross-layer reliability solutions. To evaluate, statistical fault injection (SFI) is often used to estimate the error coverage of the underlying method. Unfortunately, because a significant number of errors injected by SFI are often derated, the evaluation becomes less rigorous and less efficient. This paper makes the observation that many derated errors can be gracefully avoided to allow the fault injection campaign to focus on likely non-derated faults that stress the method-under-test. We propose a biased injection framework called CriticalFault that employs vulnerability analysis to map out relevant faults for stress testing. With CriticalFault, our results show that the injection space is reduced by 29% and 59% of the biased injections cause either software aborts or silent data corruptions, both are improvements from SFI. Moreover, we characterize different propagation behaviors of these non-derated faults and discuss the implications of designing future cross-layer solutions. Overall, not only CriticalFault is highly effective in identifying relevant test cases for current systems, but reliability researchers and engineers can also conduct more in-depth and meaningful analysis in deveoping future reliability solutions using CriticalFault.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"70 49","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120888706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward comprehensive and accurate simulation performance prediction of parallel file systems","authors":"M. Erazo, Ting Li, Jason Liu, S. Eidenbenz","doi":"10.1109/DSN.2012.6263930","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263930","url":null,"abstract":"We present the design and implementation of FileSim, a simulation framework with detailed models of parallel file systems, capable of reproducing the complex I/O behavior at scale. FileSim aims to support comprehensive and accurate end-to-end I/O performance prediction and evaluation of exascale high-end computing systems. To this end, FileSim provides several key features, including detailed, pluggable models of contemporary parallel file systems, the support of trace-driven simulation, and the capability of running large-scale I/O systems using parallel and distributed simulation.We conducted extensive validation and performance studies, through which we show that the simulator is capable of reproducing important I/O system behaviors comparable to those measured from the real systems. We demonstrate the capabilities of FileSim as a tool for exploring the parameter space and design alternatives of large-scale parallel file systems.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125974116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ExtraTime: Modeling and analysis of wearout due to transistor aging at microarchitecture-level","authors":"Fabian Oboril, M. Tahoori","doi":"10.1109/DSN.2012.6263957","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263957","url":null,"abstract":"With shrinking feature sizes, transistor aging due to NBTI and HCI becomes a major reliability challenge for microprocessors. These processes lead to increased gate delays, more failures during runtime and eventually reduced operational lifetime. Currently, to ensure correct functionality for a certain operational lifetime, additional timing margins are added to the design. However, this approach implies a significant performance loss and may fail to meet reliability requirements. Therefore, aging-aware microarchitecture design is inevitable. In this paper we present ExtraTime, a novel microarchitectural aging analysis framework, which can be used in early design phases when detailed transistor-level information is not yet available to model, analyze, and predict performance, power and aging. Furthermore, we show a comprehensive investigation using ExtraTime of various clock and power gating strategies as well as aging-aware instruction scheduling policies as a case study to show the impact of the architecture on aging.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"59 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134577343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NINEPIN: Non-invasive and energy efficient performance isolation in virtualized servers","authors":"P. Lama, Xiaobo Zhou","doi":"10.1109/DSN.2012.6263956","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263956","url":null,"abstract":"A virtualized data center faces important but challenging issue of performance isolation among heterogeneous customer applications. Performance interference resulting from the contention of shared resources among co-located virtual servers has significant impact on the dependability of application QoS. We propose and develop NINEPIN, a non-invasive and energy efficient performance isolation mechanism that mitigates performance interference among heterogeneous applications hosted in virtualized servers. It is capable of increasing data center utility. Its novel hierarchical control framework aligns performance isolation goals with the incentive to regulate the system towards optimal operating conditions. The framework combines machine learning based self-adaptive modeling of performance interference and energy consumption, utility optimization based performance targeting and a robust model predictive control based target tracking. We implement NINEPIN on a virtualized HP ProLiant blade server hosting SPEC CPU2006 and RUBiS benchmark applications. Experimental results demonstrate that NINEPIN outperforms a representative performance isolation approach, Q-Clouds, improving the overall system utility and reducing energy consumption.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perspectives on software safety case development for unmanned aircraft","authors":"E. Denney, Ganesh J. Pai, I. Habli","doi":"10.1109/DSN.2012.6263939","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263939","url":null,"abstract":"We describe our experience with the ongoing development of a safety case for an unmanned aircraft system (UAS), emphasizing autopilot software safety assurance. Our approach combines formal and non-formal reasoning, yielding a semi-automatically assembled safety case, in which part of the argument for autopilot software safety is automatically generated from formal methods. This paper provides a discussion of our experiences pertaining to (a) the methodology for creating and structuring safety arguments containing heterogeneous reasoning and information (b) the comprehensibility of, and the confidence in, the arguments created, and (c) the implications of development and safety assurance processes. The considerations for assuring aviation software safety, when using an approach such as the one in this paper, are also discussed in the context of the relevant standards and existing (process-based) certification guidelines.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126785984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BLOCKWATCH: Leveraging similarity in parallel programs for error detection","authors":"Jiesheng Wei, K. Pattabiraman","doi":"10.1109/DSN.2012.6263959","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263959","url":null,"abstract":"The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors. Parallel programs' tasks have significant similarity in control data due to the use of high-level programming models. In this study, we propose BLOCKWATCH to leverage the similarity in parallel program's control data for detecting hardware errors. BLOCKWATCH statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BLOCKWATCH on seven SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BLOCKWATCH incurs an average overhead of 16% across all programs, and provides an average SDC coverage of 97% for faults in the control data.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121574970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Becker, C. Kuznik, Mabel M. Joy, Tao Xie, W. Müller
{"title":"Binary mutation testing through dynamic translation","authors":"Markus Becker, C. Kuznik, Mabel M. Joy, Tao Xie, W. Müller","doi":"10.1109/DSN.2012.6263914","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263914","url":null,"abstract":"This paper presents a novel mutation based testing method through binary mutation. For this, a table of mutants is derived by control flow analysis of a disassembled binary under test. Mutations are injected at runtime by dynamic translation. Thus, our approach neither relies on source code nor a certain compiler. As instrumentation is avoided, testing results correspond to the original binary. In addition to high-level language faults, the proposed approach captures target specific faults related to compiling and linking. We investigated the software of an automotive case study. For this, a taxonomy of mutation operators for the ARM instruction set is proposed. Our experimental results prove 100% accuracy w.r.t. confidence metrics provided by conventional testing methods while avoiding significant mutant compilation overhead. Further speed up is achieved by an efficient binary mutation testing framework that relies on extending the open source software emulator QEMU.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115173853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristina Basescu, C. Cachin, Ittay Eyal, R. Haas, A. Sorniotti, M. Vukolic, Ido Zachevsky
{"title":"Robust data sharing with key-value stores","authors":"Cristina Basescu, C. Cachin, Ittay Eyal, R. Haas, A. Sorniotti, M. Vukolic, Ido Zachevsky","doi":"10.1145/1993806.1993843","DOIUrl":"https://doi.org/10.1145/1993806.1993843","url":null,"abstract":"A key-value store (KVS) offers functions for storing and retrieving values associated with unique keys. KVSs have become the most popular way to access Internet-scale “cloud” storage systems. We present an efficient wait-free algorithm that emulates multi-reader multi-writer storage from a set of potentially faulty KVS replicas in an asynchronous environment. Our implementation serves an unbounded number of clients that use the storage concurrently. It tolerates crashes of a minority of the KVSs and crashes of any number of clients. Our algorithm minimizes the space overhead at the KVSs and comes in two variants providing regular and atomic semantics, respectively. Compared with prior solutions, it is inherently scalable and allows clients to write concurrently. Because of the limited interface of a KVS, textbook-style solutions for reliable storage either do not work or incur a prohibitively large storage overhead. Our algorithm maintains two copies of the stored value per KVS in the common case, and we show that this is indeed necessary. If there are concurrent write operations, the maximum space complexity of the algorithm grows in proportion to the point contention. A series of simulations explore the behavior of the algorithm, and benchmarks obtained with KVS cloud-storage providers demonstrate its practicality.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126516015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}