{"title":"A self-diagnosis technique using Reed-Solomon codes for self-repairing chips","authors":"Xiangyu Tang, Seongmoon Wang","doi":"10.1109/DSN.2009.5270327","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270327","url":null,"abstract":"A self-diagnosis circuit that can be used for builtin self-repair is proposed. The circuit under diagnosis is assumed to be comprised of a large number of field repairable units (FRUs), which can be replaced with spares when they are found to be defective. Since the proposed self-diagnosis circuit is implemented on the chip, responses that are scanned out of scan chains are compressed first by the space compression circuit and then by the time compression circuit to reduce the volume of test response data. Both the space and the time compression circuit implement a Reed-Solomon code. Unlike prior work, in the proposed technique, responses of all FRUs are observed at the same time to reduce diagnosis time. The proposed diagnosis circuit can locate up to l defective FRUs. We propose a novel space-compression circuit that reduces hardware overhead by exploiting the frequency difference of the scan shift clock and the system clock. When the size of constituent multiple-input signature-register (MISR) is m, the total number of signatures to be stored for the fault-free signature is 2lmB bits, where 1 ≤ B ≤ m. The experimental results show that the proposed diagnosis circuit that can locate up to 4 defective FRUs in the same test session can be implemented with less than 1 % of hardware overhead for a large industrial design. Hardware overhead for the diagnosis circuit is lower for large CUDs.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115560670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor","authors":"Hari Kannan, Michael Dalton, C. Kozyrakis","doi":"10.1109/DSN.2009.5270347","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270347","url":null,"abstract":"Dynamic Information Flow Tracking (DIFT) is a promising security technique. With hardware support, DIFT prevents a wide range of attacks on vulnerable software with minimal performance impact. DIFT architectures, however, require significant changes in the processor pipeline that increase design and verification complexity and may affect clock frequency. These complications deter hardware vendors from supporting DIFT. This paper makes hardware support for DIFT cost-effective by decoupling DIFT functionality onto a simple, separate coprocessor. Decoupling is possible because DIFT operations and regular computation need only synchronize on system calls. The coprocessor is a small hardware engine that performs logical operations and caches 4-bit tags. It introduces no changes to the design or layout of the main processor's logic, pipeline, or caches, and can be combined with various processors. Using a full-system hardware prototype and realistic Linux workloads, we show that the DIFT coprocessor provides the same security guarantees as current DIFT architectures with low runtime overheads.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126132964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power supply induced common cause faults-experimental assessment of potential countermeasures","authors":"Peter Tummeltshammer, A. Steininger","doi":"10.1109/DSN.2009.5270308","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270308","url":null,"abstract":"Fault-tolerant architectures based on physical replication of components are vulnerable to faults that cause the same effect in all replica. Short outages in a power supply shared by all replica are a prominent example for such common cause faults. For systems in which the provision of a replicated power supply would cause prohibitive efforts the identification of reliable countermeasures against these effects is vital to maintain the required dependability level. In this paper we propose several of such countermeasures, namely parity protection, voltage monitoring and time diversity of the replica. We perform extensive fault injection experiments on three fault-tolerant dual core processor designs, one FPGA based and two commercial ASICs. These experiments provide evidence for the vulnerability of a completely unprotected dual core solution, while time diversity and voltage monitoring in combination with increased timing margins turn out particularly effective for eliminating common cause effects.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124783539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Jongerden, B. Haverkort, H. Bohnenkamp, J. Katoen
{"title":"Maximizing system lifetime by battery scheduling","authors":"M. Jongerden, B. Haverkort, H. Bohnenkamp, J. Katoen","doi":"10.1109/DSN.2009.5270351","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270351","url":null,"abstract":"The use of mobile devices is limited by the battery lifetime. Some devices have the option to connect an extra battery, or to use smart battery-packs with multiple cells to extend the lifetime. In these cases, scheduling the batteries over the load to exploit recovery properties usually extends the system lifetime. Straightforward scheduling schemes, like round robin or choosing the best battery available, already provide a big improvement compared to a sequential discharge of the batteries. In this paper we compare these scheduling schemes with the optimal scheduling scheme produced with a priced-timed automaton battery model (implemented and evaluated in Uppaal Cora). We see that in some cases the results of the simple scheduling schemes are close to optimal. However, the optimal schedules also clearly show that there is still room for improving the battery lifetimes.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An energy efficient circuit level technique to protect register file from MBUs and SETs in embedded processors","authors":"M. Fazeli, Alireza Namazi, S. Miremadi","doi":"10.1109/DSN.2009.5270337","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270337","url":null,"abstract":"This paper presents a circuit level soft errortolerant- technique, called RRC (Robust Register Caching), for the register file of embedded processors. The basic idea behind the RRC is to effectively cache the most vulnerable registers in a small highly robust register cache built by circuit level SEU and SET protected memory cells. To decide which cache entry should be replaced, the average number of read operations during a register ACE time is used as a criterion to judge. In fact, the victim cache entry is one which has the maximum read count. To minimize the power overhead of the RRC, the clock gating technique is efficiently exploited for the main register file resulting in significantly low power consumption. The RRC is able to protect the register file not only against Single Bit Upsets (SBUs) but also against Multiple Bit Upsets (MBUs) and Single Event Transients (SETs). The RRC is experimentally evaluated using the LEON processor. The experimental results show that, if the cache size is selected properly, the Architectural Vulnerability Factor (AVF) of the register file becomes about 1% while it imposes low power, area and performance overheads to the processor.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"51 89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124604573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the effectiveness of structural detection and defense against P2P-based botnets","authors":"Duc T. Ha, Guanhua Yan, S. Eidenbenz, H. Ngo","doi":"10.1109/DSN.2009.5270322","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270322","url":null,"abstract":"Recently, peer-to-peer (P2P) networks have emerged as a covert communication platform for malicious programs known as bots. As popular distributed systems, they allow bots to communicate easily while protecting the botmaster from being discovered. Existing work on P2P-based botnets mainly focuses on measurement-based studies of botnet behaviors. In this work, through simulation, we study extensively the structure of P2P networks running Kademlia, one of a few widely used P2P protocols in practice. Our simulation testbed not only incorporates the actual code of a real Kademlia client software to achieve high realism, but also applies distributed event-driven simulation techniques to achieve high scalability. Using this testbed, we analyze the scaling, clustering, reachability, and various centrality properties of P2P-based botnets from a graph-theoretical perspective. We further demonstrate experimentally and theoretically that monitoring bot activities in a P2P network is difficult, suggesting that the P2P mechanism indeed helps botnets hide their communication effectively. Finally, we evaluate the effectiveness of some potential mitigation techniques, such as content poisoning, sybil-based and eclipse-based mitigation. Conclusions drawn from this work shed light on the structure of P2P botnets, how to monitor bot activities in P2P networks, and how to mitigate botnet operations effectively.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124636240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic content web applications: Crash, failover, and recovery analysis","authors":"L. E. Buzato, G. M. D. Vieira, W. Zwaenepoel","doi":"10.1109/DSN.2009.5270331","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270331","url":null,"abstract":"This work assesses how crashes and recoveries affect the performance of a replicated dynamic content web application. RobustStore is the result of retrofitting TPC-W's on-line bookstore with Treplica, a middleware for building dependable applications. Implementations of Paxos and Fast Paxos are at the core of Treplica's efficient and programmer-friendly support for replication and recovery. The TPC-W benchmark, augmented with faultloads and dependability measures, is used to evaluate the behaviour of RobustStore. Experiments apply faultloads that cause sequential and concurrent replica crashes. RobustStore's performance drops by less than 13% during the recovery from two simultaneous replica crashes. When subject to an identical faultload and a shopping workload, a five-replicas RobustStore maintains an accuracy of 99.999%. Our results display not only good performance, total autonomy and uninterrupted availability, they also show that it is simple to develop efficient recovery-oriented applications using Treplica.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116454642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LFI: A practical and general library-level fault injector","authors":"P. Marinescu, George Candea","doi":"10.1109/DSN.2009.5270313","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270313","url":null,"abstract":"Fault injection, a critical aspect of testing robust systems, is often overlooked in the development of general-purpose software. We believe this is due to the absence of easy-to-use tools and to the extensive manual labor required to perform fault injection tests. This paper introduces LFI (L ibrary Fault Injector), a tool that automates the preparation of fault scenarios and their injection at the boundary between shared libraries and applications. LFI extends prior work by automatically profiling fault behaviors of libraries via static analysis of their binaries, thus reducing the dependence on human labor and perfect documentation. We present techniques for automatically generating injection scenarios and we describe a simple language for expressing such scenarios. LFI does not require access to libraries' source code and works for Linux, Windows, and Solaris on x86 and SPARC platforms.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"491 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133869987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Galen Lyle, Shelley Cheny, K. Pattabiraman, Z. Kalbarczyk, R. Iyer
{"title":"An end-to-end approach for the automatic derivation of application-aware error detectors","authors":"Galen Lyle, Shelley Cheny, K. Pattabiraman, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2009.5270291","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270291","url":null,"abstract":"Critical Variable Recomputation (CVR) based error detection provides high coverage for data critical to an application while reducing the performance overhead associated with detecting benign errors. However, when implemented exclusively in software, the performance penalty associated with CVR based detection is unsuitably high. This paper addresses this limitation by providing a hybrid hardware/software tool chain which allows for the design of efficient error detectors while minimizing additional hardware. Detection mechanisms are automatically derived during compilation and mapped onto hardware where they are executed in parallel with the original task at runtime. When tested using an FPGA platform, results show that our approach incurs an area overhead of 53% while increasing execution time by 27% on average.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130275400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Graham, P. Strid, Scott Roy, Fernando Rodriguez
{"title":"A low-tech solution to avoid the severe impact of transient errors on the IP interconnect","authors":"D. Graham, P. Strid, Scott Roy, Fernando Rodriguez","doi":"10.1109/DSN.2009.5270301","DOIUrl":"https://doi.org/10.1109/DSN.2009.5270301","url":null,"abstract":"There are many sources of failure within a System-on-Chip (SoC), so it is important to look beyond the processor core at other components that affect the reliable operation of the SoC, such as the fabric included in every one that connects the IP together. We use ARM's AMBA 3 AXI bus matrix to demonstrate that the impact of errors on the IP interconnect can be severe: possibly causing deadlock or memory corruption. We consider the detection of 1-bit transient faults without changing the IP that connects to the bus matrix or the AMBA 3 standard and without adding extra latency while keeping the performance and area overhead low. We explore what can be done under these constraints and propose a combination of techniques for a low-tech solution to detect these rare events.","PeriodicalId":376982,"journal":{"name":"2009 IEEE/IFIP International Conference on Dependable Systems & Networks","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127851720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}