Md. Zakirul Alam Bhuiyan, Jiannong Cao, Guojun Wang, Xuefeng Liu
{"title":"Energy-Efficient and Fault-Tolerant Structural Health Monitoring in Wireless Sensor Networks","authors":"Md. Zakirul Alam Bhuiyan, Jiannong Cao, Guojun Wang, Xuefeng Liu","doi":"10.1109/SRDS.2012.26","DOIUrl":"https://doi.org/10.1109/SRDS.2012.26","url":null,"abstract":"Wireless sensor networks (WSNs) have become an increasingly compelling platform for structural health monitoring (SHM) due to relatively low-cost, easy installation, etc. However, the challenge of effectively monitoring structural health condition (e.g., damage) under WSN constraints (e.g., limited energy, narrow bandwidth) and sensor faults has not been studied before. In this paper, we focus on tolerating sensor faults in WSN-based SHM. We design a distributed WSN framework for SHM and then examine its ability to cope with sensor faults. We bring attention to an undiscovered yet interesting fact, i.e., the real measured signals introduced by faulty sensors may cause an undamaged location to be identified as damaged (false positive) or a damaged location as undamaged (false negative) diagnosis. This can be caused by faults in sensor bonding, precision degradation, amplification gain, bias, drift, noise, and so forth. We present a distributed algorithm to detect such types of faults, and offer an online signal reconstruction algorithm to recover from the wrong diagnosis. Through simulations and a WSN prototype system, we evaluate the effectiveness of our proposed algorithms.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115719020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TAIRO: Trust-Aware Automatic Incremental Routing for Opportunistic Resource Utilization Networks","authors":"Joseph W. Baird","doi":"10.1109/SRDS.2012.73","DOIUrl":"https://doi.org/10.1109/SRDS.2012.73","url":null,"abstract":"Oppnets, or Opportunistic Resource Utilization Networks, are a kind of ad hoc computer network that grows by discovering and integrating previously unencountered devices or systems with desirable resources (Oppnets are a predecessor and a generalization of opportunistic networks proposed by others). We present work in progress on TAIRO or Trust-aware Automatic Incremental Routing for Oppnets. TAIRO is based on the innovative AIR routing scheme. AIR uses specially designed prefix labels to manage routing in an ad hoc network, which is especially well-suited for meeting the goals of Oppnets.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116441320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Jiang, Jie Wu, M. Guo, Zhenping Zhao, Donghong Wu
{"title":"A Quick and Reliable Routing for Infrastructure Surveillance with Wireless Sensor Networks","authors":"Zhen Jiang, Jie Wu, M. Guo, Zhenping Zhao, Donghong Wu","doi":"10.1109/SRDS.2012.83","DOIUrl":"https://doi.org/10.1109/SRDS.2012.83","url":null,"abstract":"In many applications, WSNs are deployed to monitor the impact of the forces of nature on the infrastructure safety, e.g. bridge collapse detection [6]. It is very important for the routing to send surveillance results without any unnecessary delay, which can be caused by extra transmissions in a detour or an unexpected wait for the availability of the relay successor and the corresponding link connection.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122771093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akshay Dabholkar, A. Dubey, A. Gokhale, G. Karsai, N. Mahadevan
{"title":"Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation","authors":"Akshay Dabholkar, A. Dubey, A. Gokhale, G. Karsai, N. Mahadevan","doi":"10.1109/SRDS.2012.59","DOIUrl":"https://doi.org/10.1109/SRDS.2012.59","url":null,"abstract":"Distributed real-time and embedded (DRE) systems are a class of real-time systems formed through a composition of predominantly legacy, closed and statically scheduled real-time subsystems, which comprise over-provisioned resources to deal with worst-case failure scenarios. The formation of the system-of-systems leads to a new range of faults that manifest at different granularities for which no statically defined fault tolerance scheme applies. Thus, dynamic and adaptive fault tolerance mechanisms are needed which must execute within the available resources without compromising the safety and timeliness of existing real-time tasks in the individual subsystems. To address these requirements, this paper describes a middleware solution called Safe Middleware Adaptation for Real-Time Fault Tolerance (SafeMAT), which opportunistically leverages the available slack in the over-provisioned resources of individual subsystems. SafeMAT comprises three primary artifacts: (1) a flexible and configurable distributed, runtime resource monitoring framework that can pinpoint in real-time the available slack in the system that is used in making dynamic and adaptive fault tolerance decisions, (2) a safe and resource aware dynamic failure adaptation algorithm that enables efficient recovery from different granularities of failures within the available slack in the execution schedule while ensuring real-time constraints are not violated and resources are not overloaded, and (3) a framework that empirically validates the correctness of the dynamic mechanisms and the safety of the DRE system. Experimental results evaluating SafeMAT on an avionics application indicates that SafeMAT incurs only 9-15% runtime fail over and 2-6% processor utilization overheads thereby providing safe and predictable failure adaptability in real-time.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125662993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steve Jiekak, Anne-Marie Kermarrec, Nicolas Le Scouarnec, G. Straub, Alexandre van Kempen
{"title":"Regenerating Codes: A System Perspective","authors":"Steve Jiekak, Anne-Marie Kermarrec, Nicolas Le Scouarnec, G. Straub, Alexandre van Kempen","doi":"10.1145/2506164.2506170","DOIUrl":"https://doi.org/10.1145/2506164.2506170","url":null,"abstract":"The explosion of the amount of data stored in cloud systems calls for more efficient paradigms for redundancy. While replication is widely used to ensure data availability, erasure correcting codes provide a much better trade-off between storage and availability. Regenerating codes are good candidates for they also offer low repair costs in term of network bandwidth. While they have been proven optimal, they are difficult to understand and parameterize. In this paper we provide an analysis of regenerating codes for practitioners to grasp the various trade-offs. More specifically we make two contributions: (i) we study the impact of the parameters by conducting an analysis at the level of the system, rather than at the level of a single device, (ii) we compare the computational costs of various implementations of codes and highlight the most efficient ones. Our goal is to provide system designers with concrete information to help them choose the best parameters and design for regenerating codes.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132390708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Private Anonymous Messaging","authors":"R. Fernando, B. Bhargava, M. Linderman","doi":"10.1109/SRDS.2012.51","DOIUrl":"https://doi.org/10.1109/SRDS.2012.51","url":null,"abstract":"Messaging systems where a user maintains a set of contacts and broadcasts messages to them is very common. We address the problem of a contact obtaining a message that it missed, from other contacts of the user while maintaining anonymity of all parties involved. We identify a set of requirements in addressing this problem and propose a modification to the hierarchical identity based encryption scheme proposed by Boneh et. al. We briefly present an implementation of the proposed cryptographic primitives as a proof of concept.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131290602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Fu, Rui Ren, Jianfeng Zhan, Wei Zhou, Zhen Jia, Gang Lu
{"title":"LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems","authors":"Xiaoyu Fu, Rui Ren, Jianfeng Zhan, Wei Zhou, Zhen Jia, Gang Lu","doi":"10.1109/SRDS.2012.40","DOIUrl":"https://doi.org/10.1109/SRDS.2012.40","url":null,"abstract":"This paper presents a set of innovative algorithms and a system, named Log Master, for mining correlations of events that have multiple attributions, i.e., node ID, application ID, event type, and event severity, in logs of large-scale cloud and HPC systems. Different from traditional transactional data, e.g., supermarket purchases, system logs have their unique characteristics, and hence we propose several innovative approaches to mining their correlations. We parse logs into an n-ary sequence where each event is identified by an informative nine-tuple. We propose a set of enhanced apriori-like algorithms for improving sequence mining efficiency, we propose an innovative abstraction-event correlation graphs (ECGs) to represent event correlations, and present an ECGs-based algorithm for fast predicting events. The experimental results on three logs of production cloud and HPC systems, varying from 433490 entries to 4747963 entries, show that our method can predict failures with a high precision and an acceptable recall rates.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"59 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116383266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}