M. Aguilera, K. Keeton, A. Merchant, Kiran-Kumar Muniswamy-Reddy, Mustafa Uysal
{"title":"Improving Recoverability in Multi-tier Storage Systems","authors":"M. Aguilera, K. Keeton, A. Merchant, Kiran-Kumar Muniswamy-Reddy, Mustafa Uysal","doi":"10.1109/DSN.2007.57","DOIUrl":"https://doi.org/10.1109/DSN.2007.57","url":null,"abstract":"Enterprise storage systems typically contain multiple storage tiers, each having its own performance, reliability, and recoverability. The primary motivation for this multi-tier organization is cost, as storage tier costs vary considerably. In this paper, we describe a file system called TierFS that stores files at multiple storage tiers while providing high recoverability at all tiers. To achieve this goal, TierFS uses several novel techniques that leverage coupling between multiple tiers to reduce data loss, take consistent snapshots across tiers, provide continuous data protection, and improve recovery time. We evaluate TierFS with analytical models, showing that TierFS can provide better recoverability than a conventional design of similar cost.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121915516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Cinque, Domenico Cotroneo, Z. Kalbarczyk, R. Iyer
{"title":"How Do Mobile Phones Fail? A Failure Data Analysis of Symbian OS Smart Phones","authors":"M. Cinque, Domenico Cotroneo, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2007.54","DOIUrl":"https://doi.org/10.1109/DSN.2007.54","url":null,"abstract":"While the new generation of hand-held devices, e.g., smart phones, support a rich set of applications, growing complexity of the hardware and runtime environment makes the devices susceptible to accidental errors and malicious attacks. Despite these concerns, very few studies have looked into the dependability of mobile phones. This paper presents measurement-based failure characterization of mobile phones. The analysis starts with a high level failure characterization of mobile phones based on data from publicly available web forums, where users post information on their experiences in using hand-held devices. This initial analysis is then used to guide the development of a failure data logger for collecting failure-related information on SymbianOS-based smart phones. Failure data is collected from 25 phones (in Italy and USA) over the period of 14 months. Key findings indicate that: (i) the majority of kernel exceptions are due to memory access violation errors (56%) and heap management problems (18%), and (ii) on average users experience a failure (freeze or self shutdown) every 11 days. While the study provide valuable insight into the failure sensitivity of smart-phones, more data and further analysis are needed before generalizing the results.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125952496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assurance Based Development of Critical Systems","authors":"P. Graydon, J. Knight, E. Strunk","doi":"10.1109/DSN.2007.17","DOIUrl":"https://doi.org/10.1109/DSN.2007.17","url":null,"abstract":"Assurance based development (ABD) is the synergistic construction of a critical computing system and an assurance case that sets out the dependability claims for the system and argues that the available evidence justifies those claims. Co-developing the system and its assurance case helps software developers to make technology choices that address the specific dependability goal of each component. This approach gives developers: (1) confidence that the technologies selected will support the system's dependability goal and (2) flexibility to deploy expensive technology, such as formal verification, only on components whose assurance needs demand it. ABD simplifies the detection - and thereby avoidance - of potential assurance difficulties as they arise, rather than after development is complete. In this paper, we present ABD together with a case study of its use.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"396 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116672809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness and Security Hardening of COTS Software Libraries","authors":"Martin Süßkraut, C. Fetzer","doi":"10.1109/DSN.2007.84","DOIUrl":"https://doi.org/10.1109/DSN.2007.84","url":null,"abstract":"COTS components, like software libraries, can be used to reduce the development effort. Unfortunately, many COTS components have been developed without a focus on robust- ness and security. We propose a novel approach to harden software libraries to improve their robustness and security. Our approach is automated, general and extensible and consists of the following stages. First, we use a static analysis to prepare and guide the following fault injection. In the dynamic analysis stage, fault injection experiments execute the library functions with both usual and extreme input values. The experiments are used to derive and verify one protection hypothesis per function (for instance, function foo fails if argument 1 is a NULL pointer). In the hardening stage, a protection wrapper is generated from these hypothesis to reject unrobust input values of library functions. We evaluate our approach by hardening a library used by Apache (a web server).","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Electing an Eventual Leader in an Asynchronous Shared Memory System","authors":"Antonio Fernández, Ernesto Jiménez, M. Raynal","doi":"10.1109/DSN.2007.39","DOIUrl":"https://doi.org/10.1109/DSN.2007.39","url":null,"abstract":"This paper considers the problem of electing an eventual leader in an asynchronous shared memory system. While this problem has received a lot of attention in message- passing systems, very few solutions have been proposed for shared memory systems. As an eventual leader cannot be elected in a pure asynchronous system prone to process crashes, the paper first proposes to enrich the asynchronous system model with an additional assumption. That assumption, denoted AWB, requires that after some time (1) there is a process whose write accesses to some shared variables are timely, and (2) the timers of the other processes are asymptotically well-behaved. The asymptotically well-behaved timer notion is a new notion that generalizes and weakens the traditional notion of timers whose durations are required to monotonically increase when the values they are set to increase. Then, the paper presents two A WB-based algorithms that elect an eventual leader. Both algorithms are independent of the value of t (the maximal number of processes that may crash). The first algorithm enjoys the following noteworthy properties: after some time only the elected leader has to write the shared memory, and all but one shared variables have a bounded domain, be the execution finite or infinite. This algorithm is consequently optimal with respect to the number of processes that have to write the shared memory. The second algorithm enjoys the following property: all the shared variables have a bounded domain. This is obtained at the following additional price: all the processes are required to forever write the shared memory. A theorem is proved which states that this price has to be paid by any algorithm that elects an eventual leader in a bounded shared memory model. This second algorithm is consequently optimal with respect to the number of processes that have to write in such a constrained memory model. In a very interesting way, these algorithms show an inherent tradeoff relating the number of processes that have to write the shared memory and the bounded/unbounded attribute of that memory.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SLAM: Sleep-Wake Aware Local Monitoring in Sensor Networks","authors":"Issa M. Khalil, S. Bagchi, N. Shroff","doi":"10.1109/DSN.2007.88","DOIUrl":"https://doi.org/10.1109/DSN.2007.88","url":null,"abstract":"Sleep-wake protocols are critical in sensor networks to ensure long-lived operation. However, an open problem is how to develop efficient mechanisms that can be incorporated with sleep-wake protocols to ensure both long-lived operation and a high degree of security. Our contribution in this paper is to address this problem by using local monitoring, a powerful technique for detecting and mitigating control and data attacks in sensor networks. In local monitoring, each node oversees part of the traffic going in and out of its neighbors to determine if the behavior is suspicious, such as, unusually long delay in forwarding a packet. Here, we present a protocol called SLAM to make local monitoring parsimonious in its energy consumption and to integrate it with any extant sleep-wake protocol in the network. The challenge is to enable sleep-wake in a secure manner even in the face of nodes that may be adversarial and not wake up nodes responsible for monitoring its traffic. We prove analytically that the security coverage is not weakened by the protocol. We perform simulations in ns-2 to demonstrate that the performance of local monitoring is practically unchanged while listening energy saving of 30 to 129 times is achieved, depending on the network load.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123111824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing Battery Lifetime Distributions","authors":"L. Cloth, M. Jongerden, B. Haverkort","doi":"10.1109/DSN.2007.26","DOIUrl":"https://doi.org/10.1109/DSN.2007.26","url":null,"abstract":"The usage of mobile devices like cell phones, navigation systems, or laptop computers, is limited by the lifetime of the included batteries. This lifetime depends naturally on the rate at which energy is consumed, however, it also depends on the usage pattern of the battery. Continuous drawing of a high current results in an excessive drop of residual capacity. However, during intervals with no or very small currents, batteries do recover to a certain extend. We model this complex behaviour with an inhomogeneous Markov reward model, following the approach of the so-called kinetic battery model (KiBaM). The state-dependent reward rates thereby correspond to the power consumption of the attached device and to the available charge, respectively. We develop a tailored numerical algorithm for the computation of the distribution of the consumed energy and show how different workload patterns influence the overall lifetime of a battery.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121864003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Cost-Effective Dependable Microcontroller Architecture with Instruction-Level Rollback for Soft Error Recovery","authors":"T. Sakata, T. Hirotsu, H. Yamada, T. Kataoka","doi":"10.1109/DSN.2007.5","DOIUrl":"https://doi.org/10.1109/DSN.2007.5","url":null,"abstract":"A cost-effective, dependable microcontroller architecture has been developed. To detect soft errors, we developed an electronic design automation (EDA) tool that generates optimized soft error-detecting logic circuits for flip-flops. After a soft error is detected, the error detection signal goes to a developed rollback control module (RCM), which resets the CPU and restores the CPU's register file from the backup register file using a rollback program routine. After the routine, the CPU restarts from the instruction executed before the soft error occurred. In addition, there is a developed error reset module (ERM) that can restore the RCM from soft errors. We also developed an error correction module (ECM) that corrects ECC errors in RAM after error detection with no delay overheads. Testing on a 32- bit RISC microcontroller and EEMBC benchmarks showed that the area overhead was under 59% and frequency overhead was under 9%. In a soft error injection simulation, the MTBF of random logic circuits, and the MTBF of RAM were 30 and 1.34 times longer, respectively, than those of the original microcontroller.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122029147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protecting Cryptographic Keys from Memory Disclosure Attacks","authors":"K. Harrison, Shouhuai Xu","doi":"10.1109/DSN.2007.77","DOIUrl":"https://doi.org/10.1109/DSN.2007.77","url":null,"abstract":"Cryptography has become an indispensable mechanism for securing systems, communications and applications. While offering strong protection, cryptography makes the assumption that cryptographic keys are kept absolutely secret. In general this assumption is very difficult to guarantee in real life because computers may be compromised relatively easily. In this paper we investigate a class of attacks, which exploit memory disclosure vulnerabilities to expose cryptographic keys. We demonstrate that the threat is real by formulating an attack that exposed the private key of an OpenSSH server within 1 minute, and exposed the private key of an Apache HTTP server within 5 minutes. We propose a set of techniques to address such attacks. Experimental results show that our techniques are efficient (i.e., imposing no performance penalty) and effective - unless a large portion of allocated memory is disclosed.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122707461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Reinforcement Learning Approach to Automatic Error Recovery","authors":"Qijun Zhu, Chun Yuan","doi":"10.1109/DSN.2007.11","DOIUrl":"https://doi.org/10.1109/DSN.2007.11","url":null,"abstract":"The increasing complexity of modern computer systems makes fault detection and localization prohibitively expensive, and therefore fast recovery from failures is becoming more and more important. A significant fraction of failures can be cured by executing specific repair actions, e.g. rebooting, even when the exact root causes are unknown. However, designing reasonable recovery policies to effectively schedule potential repair actions could be difficult and error prone. In this paper, we present a novel approach to automate recovery policy generation with reinforcement learning techniques. Based on the recovery history of the original user-defined policy, our method can learn a new, locally optimal policy that outperforms the original one. In our experimental work on data from a real cluster environment, we found that the automatically generated policy can save 10% of machine downtime.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126360353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}