{"title":"CADRE: Cycle-Accurate Deterministic Replay for Hardware Debugging","authors":"S. Sarangi, Brian Greskamp, J. Torrellas","doi":"10.1109/DSN.2006.19","DOIUrl":"https://doi.org/10.1109/DSN.2006.19","url":null,"abstract":"One of the main reasons for the difficulty of hardware verification is that hardware platforms are typically nondeterministic at clock-cycle granularity. Uninitialized state elements, I/O, and timing variations on high-speed buses all introduce nondeterminism that causes different behavior on different runs starting from the same initial state. To improve our ability to debug hardware, we would like to completely eliminate nondeterminism. This paper introduces the cycle-accurate deterministic replay (CADRE) architecture, which cost-effectively makes a board-level computer cycle-accurate deterministic. We characterize the sources of nondeterminism in computers and show how to address them. In particular, we introduce a novel scheme to ensure deterministic communication on source-synchronous buses that cross clock-domain boundaries. Experiments show that CADRE on a 4-way multiprocessor server enables cycle-accurate deterministic execution of one-second intervals with modest buffering requirements (around 200MB) and minimal performance loss (around 1%). Moreover, CADRE has modest hardware requirements","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132228595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Amir, C. Danilov, J. Kirsch, John Lane, D. Dolev, C. Nita-Rotaru, Josh Olsen, David Zage
{"title":"Scaling Byzantine Fault-Tolerant Replication toWide Area Networks","authors":"Y. Amir, C. Danilov, J. Kirsch, John Lane, D. Dolev, C. Nita-Rotaru, Josh Olsen, David Zage","doi":"10.1109/DSN.2006.63","DOIUrl":"https://doi.org/10.1109/DSN.2006.63","url":null,"abstract":"This paper presents the first hierarchical Byzantine fault-tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide area communication, and allows read-only queries to be performed locally within a site for the price of additional hardware. A prototype implementation is evaluated over several network topologies and is compared with a flat Byzantine fault-tolerant approach","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133834701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Andrés, Juan-Carlos Ruiz-Garcia, D. Gil, P. Gil
{"title":"Run-Time Reconfiguration for Emulating Transient Faults in VLSI Systems","authors":"D. Andrés, Juan-Carlos Ruiz-Garcia, D. Gil, P. Gil","doi":"10.1109/DSN.2006.62","DOIUrl":"https://doi.org/10.1109/DSN.2006.62","url":null,"abstract":"Advances in circuitry integration increase the probability of occurrence of transient faults in VLSI systems. A confident use of these systems requires the study of their behaviour in the presence of such faults. This study can be conducted using model-based fault injection techniques. In that context, field-programmable gate arrays (FPGAs) offer a great promise by enabling those techniques to execute models faster. This paper focuses on how run-time reconfiguration techniques can be used for emulating the occurrence of transient faults in VLSI models. Although the use of FPGAs for that purpose has been restricted so far to the well-known bit-flip fault model, recent studies in fault representativeness point out the need of considering a wider set of faults modelling aspects like delays, indeterminations and pulses. Therefore, the main goal of this study is to analyse the different alternatives that FPGAs offer for the emulation of these faults while greatly decreasing the time devoted to models execution","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133340977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Tang, P. Carruthers, Zuheir Totari, Michael W. Shapiro
{"title":"Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults","authors":"D. Tang, P. Carruthers, Zuheir Totari, Michael W. Shapiro","doi":"10.1109/DSN.2006.13","DOIUrl":"https://doi.org/10.1109/DSN.2006.13","url":null,"abstract":"The Solaris 10 operating system includes a number of new features for predictive self-healing. One such feature is the ability of the fault management software to diagnose memory errors and drive automatic memory page retirement (MPR), intended to reduce the negative impact of permanent memory faults that generate either correctable or uncorrectable errors on system reliability, availability, and serviceability (RAS). The MPR technique allows memory pages suffering from correctable errors and relocatable clean pages suffering from uncorrectable errors to be removed from use in the virtual memory system without interrupting user applications. It also allows relocatable dirty pages associated with uncorrectable errors to be isolated with limited impact on affected user processes, avoiding an outage for the entire system. This study applies analytical models, with parameters calibrated by field experience, to quantify the reduction that can be made by this operating system self-healing technique on the system interruptions, yearly downtime, and number of services introduced by hardware permanent faults, for typical low-end and mid-range server systems. The results show that significant improvements can be made on these three system RAS metrics by deploying the MPR capability","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115990236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User Interface Defect Detection by Hesitation Analysis","authors":"R. Reeder, R. Maxion","doi":"10.1109/DSN.2006.71","DOIUrl":"https://doi.org/10.1109/DSN.2006.71","url":null,"abstract":"Delays and errors are the frequent consequences of people having difficulty with a user interface. Such delays and errors can result in severe problems, particularly for mission-critical applications in which speed and accuracy are of the essence. User difficulty is often caused by interface-design defects that confuse or mislead users. Current techniques for isolating such defects are time-consuming and expensive, because they require human analysts to identify the points at which users experience difficulty; only then can diagnosis and repair of the defects take place. This paper presents an automated method for detecting instances of user difficulty based on identifying hesitations during system use. The method's accuracy was evaluated by comparing its judgments of user difficulty with ground truth generated by human analysts. The method's accuracy at a range of threshold parameter values is given; representative points include 92% of periods of user difficulty identified (with a 35% false-alarm rate); 86% (24% false-alarm rate); and 20% (3% false-alarm rate). Applications of the method to addressing interface defects are discussed","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Static Analysis to Enforce Safe Value Flow in Embedded Control Systems","authors":"S. Kowshik, Grigore Roşu, L. Sha","doi":"10.1109/DSN.2006.66","DOIUrl":"https://doi.org/10.1109/DSN.2006.66","url":null,"abstract":"Embedded control systems consist of multiple components with different criticality levels interacting with each other. For example, in a passenger jet, the navigation system interacts with the passenger entertainment system in providing passengers the distance-to-destination information. It is imperative that failures in the non-critical subsystem should not compromise critical functionality. This architectural principle for robustness can, however, be easily compromised by implementation-level errors. We describe SafeFlow, which statically analyzes core components in the system to ensure that they use non-core values communicated through shared memory only if they are run-time monitored for safety or recoverability. Using simple, local annotations and semantic restrictions on shared memory usage in the core component, SafeFlow precisely identifies accesses to unmonitored non-core values. With a few false positives, it identifies erroneous dependencies of critical data on non-core values that can arise due to programming errors, inadvertent accesses, or wrong assumptions regarding the absence of difficult-to-detect implementation errors such as data races and synchronization. We demonstrate the utility of SafeFlow by applying it to discover critical value flow dependencies in three prototype systems","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hemant Sengar, D. Wijesekera, Haining Wang, S. Jajodia
{"title":"VoIP Intrusion Detection Through Interacting Protocol State Machines","authors":"Hemant Sengar, D. Wijesekera, Haining Wang, S. Jajodia","doi":"10.1109/DSN.2006.73","DOIUrl":"https://doi.org/10.1109/DSN.2006.73","url":null,"abstract":"Being a fast-growing Internet application, voice over Internet protocol (VoIP) shares the network resources with the regular Internet traffic, and is susceptible to the existing security holes of the Internet. Moreover, given that voice communication is time sensitive and uses a suite of interacting protocols, VoIP exposes new forms of vulnerabilities to malicious attacks. In this paper, we propose a highly-needed VoIP intrusion detection system. Our approach is novel in that, it utilizes not only the state machines of network protocols but also the interaction among them for intrusion detection. This detection approach is particularly suited for protecting VoIP applications, in which a melange of protocols are involved to provide IP telephony services. Based on tracking deviations from interacting protocol state machines, our solution shows promising detection characteristics and low runtime impact on the perceived quality of voice streams","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132684851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Barbarians in the Gate: An Experimental Validation of NIC-based Distributed Firewall Performance and Flood Tolerance","authors":"Michael Ihde, W. Sanders","doi":"10.1109/DSN.2006.17","DOIUrl":"https://doi.org/10.1109/DSN.2006.17","url":null,"abstract":"This paper presents our experience validating the flood tolerance of two network interface card (NIC)-based embedded firewall solutions, the embedded firewall (EFW) and the autonomic distributed firewall (ADF). Experiments were performed for both embedded firewall devices to determine their flood tolerance and performance characteristics. The results show that both are vulnerable to packet flood attacks on a 100 Mbps network. In certain configurations, we found that both embedded firewall devices can have a significant, negative impact on bandwidth and application performance. These results imply first that, firewall rule-sets should be optimized for performance-sensitive applications, and second, that proper consideration must be given to attack risks and mitigations before either the EFW or ADF is deployed. Finally, we believe that future embedded firewall implementations should be vetted in a manner similar to that presented in this paper. Our experience shows that when their limitations are properly considered, both the EFW and ADF can be safely deployed to enhance network security without undue risk","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115281974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Reconfigurable Generic Dual-Core Architecture","authors":"T. Kottke, A. Steininger","doi":"10.1109/DSN.2006.8","DOIUrl":"https://doi.org/10.1109/DSN.2006.8","url":null,"abstract":"In this paper we propose a generic frame for the implementation of a dual-core processor with two modes of operation. One is the safety mode that allows to run the two cores in lock step in a classical master/checker fashion. A clock delay of 1.5 clock cycles between master and checker establishes the temporal redundancy to minimize the potential for common mode faults. The second operation mode allows a parallel execution of different instruction streams on the two cores in a multiprocessor fashion. The possibility to dynamically switch between the two modes allows for an efficient utilization of the duplicated core. We propose an implementation of such a generic frame that can be applied in conjunction with virtually any standard processor core. Also we perform a systematic failure analysis for the safety mode and the mode switching procedure. Experimental fault injection confirms that our reconfigurable architecture indeed provides the same fail safe properties as the classical master/checker architecture","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121487709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaustubh R. Joshi, W. Sanders, M. Hiltunen, R. Schlichting
{"title":"Automatic Recovery Using Bounded Partially Observable Markov Decision Processes","authors":"Kaustubh R. Joshi, W. Sanders, M. Hiltunen, R. Schlichting","doi":"10.1109/DSN.2006.16","DOIUrl":"https://doi.org/10.1109/DSN.2006.16","url":null,"abstract":"This paper provides a technique, based on partially observable Markov decision processes (POMDPs), for building automatic recovery controllers to guide distributed system recovery in a way that provides provable assurances on the quality of the generated recovery actions even when the diagnostic information may be imprecise. Lower bounds on the cost of recovery are introduced and proved, and it is shown how the characteristics of the recovery process can be used to ensure that the lower bounds converge even on undiscounted models. The bounds used in an appropriate online controller provide it with provable termination properties. Simulation-based experimental results on a realistic e-commerce system demonstrate that the proposed bounds can be improved iteratively, and the resulting controller convincingly outperforms a controller that uses heuristics instead of bounds","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129597715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}