M. Aguilera, K. Keeton, A. Merchant, Kiran-Kumar Muniswamy-Reddy, Mustafa Uysal
{"title":"Improving Recoverability in Multi-tier Storage Systems","authors":"M. Aguilera, K. Keeton, A. Merchant, Kiran-Kumar Muniswamy-Reddy, Mustafa Uysal","doi":"10.1109/DSN.2007.57","DOIUrl":"https://doi.org/10.1109/DSN.2007.57","url":null,"abstract":"Enterprise storage systems typically contain multiple storage tiers, each having its own performance, reliability, and recoverability. The primary motivation for this multi-tier organization is cost, as storage tier costs vary considerably. In this paper, we describe a file system called TierFS that stores files at multiple storage tiers while providing high recoverability at all tiers. To achieve this goal, TierFS uses several novel techniques that leverage coupling between multiple tiers to reduce data loss, take consistent snapshots across tiers, provide continuous data protection, and improve recovery time. We evaluate TierFS with analytical models, showing that TierFS can provide better recoverability than a conventional design of similar cost.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121915516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Cinque, Domenico Cotroneo, Z. Kalbarczyk, R. Iyer
{"title":"How Do Mobile Phones Fail? A Failure Data Analysis of Symbian OS Smart Phones","authors":"M. Cinque, Domenico Cotroneo, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2007.54","DOIUrl":"https://doi.org/10.1109/DSN.2007.54","url":null,"abstract":"While the new generation of hand-held devices, e.g., smart phones, support a rich set of applications, growing complexity of the hardware and runtime environment makes the devices susceptible to accidental errors and malicious attacks. Despite these concerns, very few studies have looked into the dependability of mobile phones. This paper presents measurement-based failure characterization of mobile phones. The analysis starts with a high level failure characterization of mobile phones based on data from publicly available web forums, where users post information on their experiences in using hand-held devices. This initial analysis is then used to guide the development of a failure data logger for collecting failure-related information on SymbianOS-based smart phones. Failure data is collected from 25 phones (in Italy and USA) over the period of 14 months. Key findings indicate that: (i) the majority of kernel exceptions are due to memory access violation errors (56%) and heap management problems (18%), and (ii) on average users experience a failure (freeze or self shutdown) every 11 days. While the study provide valuable insight into the failure sensitivity of smart-phones, more data and further analysis are needed before generalizing the results.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125952496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assurance Based Development of Critical Systems","authors":"P. Graydon, J. Knight, E. Strunk","doi":"10.1109/DSN.2007.17","DOIUrl":"https://doi.org/10.1109/DSN.2007.17","url":null,"abstract":"Assurance based development (ABD) is the synergistic construction of a critical computing system and an assurance case that sets out the dependability claims for the system and argues that the available evidence justifies those claims. Co-developing the system and its assurance case helps software developers to make technology choices that address the specific dependability goal of each component. This approach gives developers: (1) confidence that the technologies selected will support the system's dependability goal and (2) flexibility to deploy expensive technology, such as formal verification, only on components whose assurance needs demand it. ABD simplifies the detection - and thereby avoidance - of potential assurance difficulties as they arise, rather than after development is complete. In this paper, we present ABD together with a case study of its use.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"396 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116672809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness and Security Hardening of COTS Software Libraries","authors":"Martin Süßkraut, C. Fetzer","doi":"10.1109/DSN.2007.84","DOIUrl":"https://doi.org/10.1109/DSN.2007.84","url":null,"abstract":"COTS components, like software libraries, can be used to reduce the development effort. Unfortunately, many COTS components have been developed without a focus on robust- ness and security. We propose a novel approach to harden software libraries to improve their robustness and security. Our approach is automated, general and extensible and consists of the following stages. First, we use a static analysis to prepare and guide the following fault injection. In the dynamic analysis stage, fault injection experiments execute the library functions with both usual and extreme input values. The experiments are used to derive and verify one protection hypothesis per function (for instance, function foo fails if argument 1 is a NULL pointer). In the hardening stage, a protection wrapper is generated from these hypothesis to reject unrobust input values of library functions. We evaluate our approach by hardening a library used by Apache (a web server).","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SLAM: Sleep-Wake Aware Local Monitoring in Sensor Networks","authors":"Issa M. Khalil, S. Bagchi, N. Shroff","doi":"10.1109/DSN.2007.88","DOIUrl":"https://doi.org/10.1109/DSN.2007.88","url":null,"abstract":"Sleep-wake protocols are critical in sensor networks to ensure long-lived operation. However, an open problem is how to develop efficient mechanisms that can be incorporated with sleep-wake protocols to ensure both long-lived operation and a high degree of security. Our contribution in this paper is to address this problem by using local monitoring, a powerful technique for detecting and mitigating control and data attacks in sensor networks. In local monitoring, each node oversees part of the traffic going in and out of its neighbors to determine if the behavior is suspicious, such as, unusually long delay in forwarding a packet. Here, we present a protocol called SLAM to make local monitoring parsimonious in its energy consumption and to integrate it with any extant sleep-wake protocol in the network. The challenge is to enable sleep-wake in a secure manner even in the face of nodes that may be adversarial and not wake up nodes responsible for monitoring its traffic. We prove analytically that the security coverage is not weakened by the protocol. We perform simulations in ns-2 to demonstrate that the performance of local monitoring is practically unchanged while listening energy saving of 30 to 129 times is achieved, depending on the network load.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123111824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workshop on Dependable and Secure Nanocomputing","authors":"J. Arlat, R. Iyer, M. Nicolaidis","doi":"10.1109/DSN.2007.106","DOIUrl":"https://doi.org/10.1109/DSN.2007.106","url":null,"abstract":"The continuous advances and progress made in hardware technology makes it possible to foresee a realm of unprecedented performance levels and new application-driven architectural designs, as evidenced by the recent announcement of a 80-core chip [1]. Nevertheless, the evolution of nanotechnologies raises serious challenges with respect to both dependability and security viewpoints. Issues at stake go far beyond developing protections with respect to accidental disturbances in operation, they also relate to the unreliability and variability that will characterize emerging nanoscale devices. Accounting for malicious threats targeting hardware circuits will constitute another increasing concern.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134218440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness Testing of the Windows DDK","authors":"Manuel Mendonça, N. Neves","doi":"10.1109/DSN.2007.85","DOIUrl":"https://doi.org/10.1109/DSN.2007.85","url":null,"abstract":"Modern computers interact with many kinds of external devices, which have lead to a state where device drivers (DD) account for a substantial part of the operating system (OS) code. Currently, most of the systems crashes can be attributed to DD because of flaws contained in their implementation. In this paper, we evaluate how well Windows protects itself from erroneous input coming from faulty drivers. Three Windows versions were considered in this study, Windows XP and 2003 Server, and the future Windows release Vista. Our results demonstrate that in general these OS are reasonably vulnerable, and that a few of the injected faults cause the system to hang or crash. Moreover, all of them handle bad inputs in a roughly equivalent manner, which is worrisome because it means that no major robustness enhancements are to be expected in the DD architecture of the next Windows Vista.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132441682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikael Asplund, S. Nadjm-Tehrani, S. Beyer, Pablo Galdámez
{"title":"Measuring Availability in Optimistic Partition-Tolerant Systems with Data Constraints","authors":"Mikael Asplund, S. Nadjm-Tehrani, S. Beyer, Pablo Galdámez","doi":"10.1109/DSN.2007.62","DOIUrl":"https://doi.org/10.1109/DSN.2007.62","url":null,"abstract":"Replicated systems that run over partitionable environments, can exhibit increased availability if isolated partitions are allowed to optimistically continue their execution independently. This availability gain is traded against consistency, since several replicas of the same objects could be updated separately. Once partitioning terminates, divergences in the replicated state needs to be reconciled. One way to reconcile the state consists of letting the application manually solve inconsistencies. However, there are several situations where automatic reconciliation of the replicated state is meaningful. We have implemented replication and automatic reconciliation protocols that can be used as building blocks in a partition-tolerant middleware. The novelty of the protocols is the continuous service of the application even during the reconciliation process. A prototype system is experimentally evaluated to illustrate the increased availability despite network partitions.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Electing an Eventual Leader in an Asynchronous Shared Memory System","authors":"Antonio Fernández, Ernesto Jiménez, M. Raynal","doi":"10.1109/DSN.2007.39","DOIUrl":"https://doi.org/10.1109/DSN.2007.39","url":null,"abstract":"This paper considers the problem of electing an eventual leader in an asynchronous shared memory system. While this problem has received a lot of attention in message- passing systems, very few solutions have been proposed for shared memory systems. As an eventual leader cannot be elected in a pure asynchronous system prone to process crashes, the paper first proposes to enrich the asynchronous system model with an additional assumption. That assumption, denoted AWB, requires that after some time (1) there is a process whose write accesses to some shared variables are timely, and (2) the timers of the other processes are asymptotically well-behaved. The asymptotically well-behaved timer notion is a new notion that generalizes and weakens the traditional notion of timers whose durations are required to monotonically increase when the values they are set to increase. Then, the paper presents two A WB-based algorithms that elect an eventual leader. Both algorithms are independent of the value of t (the maximal number of processes that may crash). The first algorithm enjoys the following noteworthy properties: after some time only the elected leader has to write the shared memory, and all but one shared variables have a bounded domain, be the execution finite or infinite. This algorithm is consequently optimal with respect to the number of processes that have to write the shared memory. The second algorithm enjoys the following property: all the shared variables have a bounded domain. This is obtained at the following additional price: all the processes are required to forever write the shared memory. A theorem is proved which states that this price has to be paid by any algorithm that elects an eventual leader in a bounded shared memory model. This second algorithm is consequently optimal with respect to the number of processes that have to write in such a constrained memory model. In a very interesting way, these algorithms show an inherent tradeoff relating the number of processes that have to write the shared memory and the bounded/unbounded attribute of that memory.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Cost-Effective Dependable Microcontroller Architecture with Instruction-Level Rollback for Soft Error Recovery","authors":"T. Sakata, T. Hirotsu, H. Yamada, T. Kataoka","doi":"10.1109/DSN.2007.5","DOIUrl":"https://doi.org/10.1109/DSN.2007.5","url":null,"abstract":"A cost-effective, dependable microcontroller architecture has been developed. To detect soft errors, we developed an electronic design automation (EDA) tool that generates optimized soft error-detecting logic circuits for flip-flops. After a soft error is detected, the error detection signal goes to a developed rollback control module (RCM), which resets the CPU and restores the CPU's register file from the backup register file using a rollback program routine. After the routine, the CPU restarts from the instruction executed before the soft error occurred. In addition, there is a developed error reset module (ERM) that can restore the RCM from soft errors. We also developed an error correction module (ECM) that corrects ECC errors in RAM after error detection with no delay overheads. Testing on a 32- bit RISC microcontroller and EEMBC benchmarks showed that the area overhead was under 59% and frequency overhead was under 9%. In a soft error injection simulation, the MTBF of random logic circuits, and the MTBF of RAM were 30 and 1.34 times longer, respectively, than those of the original microcontroller.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122029147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}