{"title":"Processor-Level Selective Replication","authors":"Nithin Nakka, K. Pattabiraman, R. Iyer","doi":"10.1109/DSN.2007.75","DOIUrl":"https://doi.org/10.1109/DSN.2007.75","url":null,"abstract":"We propose a processor-level technique called selective replication, by which the application can choose where in its application stream and to what degree it requires replication. Recent work on static analysis and fault-injection-based experiments on applications reveals that certain variables in the application are critical to its crash- and hang-free execution. If it can be ensured that only the computation of these variables is error-free, then a high degree of crash/hang coverage can be achieved at a low performance overhead to the application. The selective replication technique provides an ideal platform for validating this claim. The technique is compared against complete duplication as provided in current architecture-level techniques. The results show that with about 59% less overhead than full duplication, selective replication detects 97% of the data errors and 87% of the instruction errors that were covered by full duplication. It also reduces the detection of errors benign to the final outcome of the application by 17.8% as compared to full duplication.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133952097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dependability Assessment of Grid Middleware","authors":"N. Looker, Jie Xu","doi":"10.1109/DSN.2007.31","DOIUrl":"https://doi.org/10.1109/DSN.2007.31","url":null,"abstract":"Dependability is a key factor in any software system due to the potential costs in both time and money a failure may cause. Given the complexity of grid applications that rely on dependable grid middleware, tools for the assessment of grid middleware are highly desirable. Our past research, based around our fault injection technology (FIT) framework and its implementation, WS-FIT, has demonstrated that network level fault injection can be a valuable tool in assessing the dependability of traditional Web services. Here we apply our FIT framework to globus grid middleware using grid-FIT, our new implementation of the FIT framework, to obtain middleware dependability assessment data. We conclude by demonstrating that grid-FIT can be applied to globus grid systems to assess dependability as part of a fault removal mechanism and thus allow middleware dependability to be increased.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Cookie Usage Setting with CookiePicker","authors":"C. Yue, Mengjun Xie, Haining Wang","doi":"10.1109/DSN.2007.21","DOIUrl":"https://doi.org/10.1109/DSN.2007.21","url":null,"abstract":"HTTP cookies have been widely used for maintaining session states, personalizing, authenticating, and tracking user behaviors. Despite their importance and usefulness, cookies have raised public concerns on Internet privacy because they can be exploited by Web sites to track and build user profiles. In addition, stolen cookies may also incur security problems. However, current web browsers lack secure and convenientmechanisms for cookie management. A cookie management scheme, which is easy-to-use and has minimal privacy risk, is in great demand; but designing such a scheme is a challenge. In this paper, we introduce CookiePicker, a system that can automatically validate the usefulness of cookies from a Web site and set the cookie usage permission on behalf of users. CookiePicker helps users achieve the maximum benefit brought by cookies, while minimizing the possible privacy and security risks. We implement CookiePicker as an extension to Firefox Web browser, and obtain promising results in the experiments.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134629541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On a Modeling Framework for the Analysis of Interdependencies in Electric Power Systems","authors":"S. Chiaradonna, P. Lollini, F. Giandomenico","doi":"10.1109/DSN.2007.68","DOIUrl":"https://doi.org/10.1109/DSN.2007.68","url":null,"abstract":"Nowadays, economy, security and quality of life heavily depend on the resiliency of a number of critical infrastructures, including the electric power system (EPS), through which vital services are provided. In existing EPS two cooperating infrastructures are involved: the electric infrastructure (EI) for the electricity generation and transportation to final users, and its information-technology based control system (ITCS) devoted to controlling and regulating the EI physical parameters and triggering reconfigurations in emergency situations. This paper proposes a modeling framework to capture EI and ITCS aspects, focusing on their interdependencies that contributed to the occurrence of several cascading failures in the past 40 years. A quite detailed analysis of the EI and ITCS structure and behavior is performed; in particular, the ITCS and EI behaviors are described by discrete and hybrid-state processes, respectively. To substantiate the approach, the implementation of a few basic modeling mechanisms inside an existing multiformalism/ multi-solution tool is also discussed.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133419131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The National Programme for Information Technology in the UK Health Service: Dependability Challenges and Strategies","authors":"B. Randell","doi":"10.1109/DSN.2007.93","DOIUrl":"https://doi.org/10.1109/DSN.2007.93","url":null,"abstract":"The National Health Service (NHS) provides the majority of health-care in the UK. Its main section, that for England, serves a population of over 50 million, employs 40,000 general practitioners (family physicians), 80,000 other doctors, and 350,000 nurses, and includes over 300 hospitals. Its National Programme for Information Technology (NPfIT) is the largest civil IT project in the world. (Estimates of its total cost have ranged from £6.2 billion up to £20 billion.) This project, which was launched in 2002, aims to implement electronic care records for all patients and to provide a reliable and secure information service, for medical records, radiography, patient administration, etc., for all the hospitals, and all general practitioners' premises, to which all the NHS health professionals in England will have strictly-controlled access. This Special Plenary Session will provide an overview of NPfIT, and its dependability challenges and strategies. Speakers will, it is hoped, include representatives of Connecting for Health (the NHS Agency responsible for NPfIT), the medical profession, and the dependability research community.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125159487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HyParView: A Membership Protocol for Reliable Gossip-Based Broadcast","authors":"J. Leitao, J. Pereira, L. Rodrigues","doi":"10.1109/dsn.2007.56","DOIUrl":"https://doi.org/10.1109/dsn.2007.56","url":null,"abstract":"Gossip, or epidemic, protocols have emerged as a powerful strategy to implement highly scalable and resilient reliable broadcast primitives. Due to scalability reasons, each participant in a gossip protocol maintains a partial view of the system. The reliability of the gossip protocol depends upon some critical properties of these views, such as degree distribution and clustering coefficient. Several algorithms have been proposed to maintain partial views for gossip protocols. In this paper, we show that under a high number of faults, these algorithms take a long time to restore the desirable view properties. To address this problem, we present HyParView, a new membership protocol to support gossip-based broadcast that ensures high levels of reliability even in the presence of high rates of node failure. The HyParView protocol is based on a novel approach that relies in the use of two distinct partial views, which are maintained with different goals by different strategies.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129401951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"R-Sentry: Providing Continuous Sensor Services against Random Node Failures","authors":"Shengchao Yu, Yanyong Zhang","doi":"10.1109/DSN.2007.79","DOIUrl":"https://doi.org/10.1109/DSN.2007.79","url":null,"abstract":"The success of sensor-driven applications is reliant on whether a steady stream of data can be provided by the underlying system. This need, however, poses great challenges to sensor systems, mainly because the sensor nodes from which these systems are built have extremely short lifetimes. In order to extend the lifetime of the networked system beyond the lifetime of an individual sensor node, a common practice is to deploy a large array of sensor nodes and, at any time, have only a minimal set of nodes active performing duties while others stay in sleep mode to conserve energy. With this rationale, random node failures, either from active nodes or from redundant nodes, can seriously disrupt system operations. To address this need, we propose R-Sentry, which attempts to bound the service loss duration due to node failures, by coordinating the schedules among redundant nodes. Our simulation results show that compared to PEAS, a popular node scheduling algorithm, R-Sentry can provide a continuous 95% coverage through bounded recoveries from frequent node failures, while prolonging the lifetime of a sensor network by roughly 30%.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132743543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performability Models for Multi-Server Systems with High-Variance Repair Durations","authors":"H. Schwefel, Imad Antonios","doi":"10.1109/DSN.2007.73","DOIUrl":"https://doi.org/10.1109/DSN.2007.73","url":null,"abstract":"We consider cluster systems with multiple nodes where each server is prone to run tasks at a degraded level of service due to some software or hardware fault. The cluster serves tasks generated by remote clients, which are potentially queued at a dispatcher. We present an analytic queueing model of such systems, represented as an M/MMPP/1 queue, and derive and analyze exact numerical solutions for the mean and tail-probabilities of the queue-length distribution. The analysis shows that the distribution of the repair time is critical for these performability metrics. Additionally, in the case of high-variance repair times, the model reveals so-called blow-up points, at which the performance characteristics change dramatically. Since this blowup behavior is sensitive to a change in model parameters, it is critical for system designers to be aware of the conditions under which it occurs. Finally, we present simulation results that demonstrate the robustness of this qualitative blow-up behavior towards several model variations.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131059033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Quality of Service of Crash-Recovery Failure Detectors","authors":"Tiejun Ma, J. Hillston, S. Anderson","doi":"10.1109/DSN.2007.70","DOIUrl":"https://doi.org/10.1109/DSN.2007.70","url":null,"abstract":"In this paper, we study and model a crash-recovery target and its failure detector's probabilistic behavior. We extend quality of service (QoS) metrics to measure the recovery detection speed and the proportion of the detected failures of a crash-recovery failure detector. Then the impact of the dependability of the crash-recovery target on the QoS bounds for such a crash-recovery failure detector is analysed by adopting general dependability metrics such as MTTF and MTTR. In addition, we analyse how to estimate the failure detector's parameters to achieve the QoS from a requirement based on Chen's NFD-S algorithm. We also demonstrate how to execute the configuration procedure of this crash-recovery failure detector. The simulations are based on the revised NFD-S algorithm with various MTTF and MTTR. The simulation results show that the dependability of a recoverable monitored target could have significant impact on the QoS of such a failure detector and match our analysis results.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129520007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BlackJack: Hard Error Detection with Redundant Threads on SMT","authors":"E. Schuchman, T. N. Vijaykumar","doi":"10.1109/DSN.2007.23","DOIUrl":"https://doi.org/10.1109/DSN.2007.23","url":null,"abstract":"Testing is a difficult process that becomes more difficult with scaling. With smaller and faster devices, tolerance for errors shrinks and devices may act correctly under certain condition and not under others. As such, hard errors may exist but are only exercised by very specific machine state and signal pathways. Targeting these errors is difficult, and creating test cases that cover all machine states and pathways is not possible. In addition, new complications during burn-in may mean latent hard errors are not exposed in the fab and reach the customer before becoming active. To address this problem, we propose an architecture we call BlackJack that allows hard errors to be detected using redundant threads running on a single SMT core. This technique provides a safety-net that catches hard errors that were either latent during test or just not covered by the test cases at all. Like SRT, our technique works by executing redundant copies and verifying that their resulting machine states agree. Unlike SRT, BlackJack is able to achieve high hard error instruction coverage by executing redundant threads on different front and backend resources in the pipeline. We show that for a 15% performance penalty over SRT, BlackJack achieves 97% hard error instruction coverage compared to SRT's 35%.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}