{"title":"Emergence: A New Source of Failures in Complex Systems","authors":"Lorenzo Vinerbi, A. Bondavalli, P. Lollini","doi":"10.1109/DEPEND.2010.28","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.28","url":null,"abstract":"The paper focuses on emergent systems, i.e., complex systems for which it is impossible to foresee the overall system’s behavior from the composition/interaction of the single functions/services performed by its components. The lack of knowledge on some aspects of the system, which is the source of the emergence, is an unavoidable aspect that should be explicitly taken into account in all the phases of the system lifecycle. In this paper we introduce the concept of emergence as a new source of failures in complex systems, and we discuss the advantages in considering emergence during requirement specifications, both in terms of perception of the complex systems and in terms of proactive actions to prevent system failures.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115019231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification and Impact Analysis of Faults in Automated System Management","authors":"Barry McLarnon, P. Robinson, P. Sage, P. Milligan","doi":"10.1109/DEPEND.2010.34","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.34","url":null,"abstract":"The reliability of automated system management solutions will increase in importance as the use of cloud computing and data centres expands. As part of a study to improve reliability, this paper provides a classification of faults that can occur in automated system management and proposes a method for determining the severity of such faults. A baseline deployment is compared with an alternate proposed configuration to determine the difference in reliability. The results gained show a significant improvement over the baseline. While it is still in development, the method is able to determine and compare the reliability of deployment configurations from early in the design process.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131651208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System Fault Behavior Model Considering the Effects of Structural Factors and Method of Its Description","authors":"Guangyan Zhao, K. Rui, Yufeng Sun, Zhao Gang","doi":"10.1109/DEPEND.2010.26","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.26","url":null,"abstract":"System dependability research is gradually making the transit of focusing on “fault time” to focusing on “fault process”. The researches on dependability models of micro failure mechanism and macro fault behavior have become hot topics. The system fault behavior model describes the occurrence and development process of system fault in term of the behavior, and also presents the effect of basic unit failure on the whole system. The factors influencing the system fault behavior include the intrinsic factors such as material, structure, etc. and the extrinsic factors such as usage mode, environment, human factors, etc. The interactions between different types of factors lead to the complexity of the fault behavior models. On the basis of the frame of fault behavior model, this paper builds the system fault behavior model considering the effects of structural factors in term of the intrinsic factors. This paper adopts extended Petri Nets models to describe such model so that realize simulating the fault process. This method builds dependability model from a new angle, pays attention to process depicting, and describes the rules of the occurrence and development of system fault, which is a strong complement to the existing dependability theory. Under the condition, this paper puts forward the fault behavior model based on structure and its description method, which offers a kind of feasible technique to the frame of the fault behavior theory.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134520880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced Policies for the Administrative Delegation in Federated Environments","authors":"M. Pérez, Gabriel López, A. Skarmeta, A. Pasic","doi":"10.1109/DEPEND.2010.20","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.20","url":null,"abstract":"In existing federated identity management systems it is more and more necessary new set of advanced policies, such as policies for the administrative delegation. They allow administrators to delegate a subset of the system policies management to other users, who will have a much wider knowledge in the application area where these policies will be applied. In this paper, we present an infrastructure that manages the complete life cycle of the administrative delegation policies, as well as a way for reducing the complexity in their management for some scenarios, where these users do not have to be experts in the subject area. These users will only have to fill in a simple template, which is automatically generated from the administrative policy created by the administrator.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123233776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrey Brito, Stefan Weigert, Martin Süßkraut, C. Fetzer, P. Felber
{"title":"Handling Crash and Software Faults Efficiently in Distributed Event Stream Processing","authors":"Andrey Brito, Stefan Weigert, Martin Süßkraut, C. Fetzer, P. Felber","doi":"10.1109/DEPEND.2010.32","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.32","url":null,"abstract":"Active replication is a common approach to handle failures in distributed systems, including Event Stream Processing (ESP) systems. However, one weakness of conventional active replication is that replicas, being equal and in the same state, are susceptible to common-mode crashes due to software bugs. We propose a new approach to active replication that assumes a failure model stronger than fail-stop but weaker than models permitting arbitrary failures. We combine transactional memory and extended runtime checking to achieve: (i) low processing latency in failure-free runs by allowing downstream nodes to use speculative results and, thus, to circumvent the overhead added by the extended runtime checks; (ii) reduce the MTTR by enabling localized rollbacks (with word granularity) in several cases. We show that major limitations of n-variant active replication (e.g., multi-threading support, complex and slow recovery) can be overcome and tolerance to software bugs is orthogonal to Byzantine fault tolerance.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125299080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FTDIS: A Fault Tolerant Dynamic Instruction Scheduling","authors":"Roza Ghamari, Amir Rajabzadeh","doi":"10.1109/DEPEND.2010.13","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.13","url":null,"abstract":"In this work, we target the robustness for controller scheduler of type Tomasulo for SEU faults model. The proposed fault-tolerant dynamic scheduling unit is named FTDIS, in which critical control data of scheduler is protected from driving to an unwanted stage using Triple Modular Redundancy and majority voting approaches. Moreover, the feedbacks in voters produce recovery capability for detected faults in the FTDIS, enabling both fault mask and recovery for system. As the results of analytical evaluations demonstrate, the implemented FTDIS unit has over 99% fault detection coverage in the condition of existing less than 4 faults in critical bits. Furthermore, based on experiments, the FTDIS has a 200% hardware overhead comparing to the primitive dynamic scheduling control unit and about 50% overhead in comparision to a full CPU core. The proposed unit also has no performance penalty during simulation. In addition, the experiments show that FTDIS consumes 98% more power than the primitive unit.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126242051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ute Schiffel, A. Schmitt, Martin Süßkraut, C. Fetzer
{"title":"Software-Implemented Hardware Error Detection: Costs and Gains","authors":"Ute Schiffel, A. Schmitt, Martin Süßkraut, C. Fetzer","doi":"10.1109/DEPEND.2010.16","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.16","url":null,"abstract":"Commercial off-the-shelf (COTS) hardware is becoming less and less reliable because of the continuously decreasing feature sizes of integrated circuits. But due to economic constraints, more and more critical systems will be based on basically unreliable COTS hardware. Usually in such systems redundant execution is used to detect erroneous executions. However, arithmetic codes promise much higher error detection rates. Yet, they are generally assumed to generate very large slowdowns. In this paper, we assess and compare the runtime overhead and error detection capabilities of redundancy and several arithmetic codes. Our results demonstrate a clear trade-off between runtime costs and gained safety. However, unexpectedly the runtime costs for arithmetic codes compared to redundancy increase only linearly, while the gained safety increases exponentially.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128502279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coordination and Deployment of Mobile Agents on Dependable Systems","authors":"I. Satoh","doi":"10.1109/DEPEND.2010.29","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.29","url":null,"abstract":"This paper presents a framework for enabling mobile agents to be organized dynamically and autonomously with two unique compositions and interagent interactions on dependable distributed systems. The first enable an agent to contain other agents inside it and migrate to another agent or computer with its inner agents. It provides a powerful approach to composing and deploying large-scale mobile software. The second enable an agent to define its destination according to another agent's location with several policies to support adaptation on distributed systems. It also introduces several higher-level coordinations between mobile agents, e.g., master-slave and redundancy, which are useful to implement dependable systems. This paper also describes a prototype implementation of the framework with mobile agent technology and several applications of it.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RMAP: A Reliability-Aware Application Mapping for Network-on-Chips","authors":"A. Patooghy, H. Tabkhi, S. Miremadi","doi":"10.1109/DEPEND.2010.25","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.25","url":null,"abstract":"This paper proposes a reliability-aware application mapping for mesh-based NoCs. The proposed reliable mapping, called RMAP, adds redundant communications to the application graph in order to improve the reliability of packet delivery in NoCs. The RMAP divides the application graph into two sub-graphs which have the lowest possible communication with each other. One of the sub-graphs is mapped on the upper triangular nodes of the NoC and the other is mapped on the lower triangular nodes. In this way, lower traffic load is imposed on some channels which are efficiently used to route packets of redundant communications. This minimizes the overheads imposed to the NoC due to redundant communications. A cycle accurate NoC simulator is used to evaluate the reliability and performance of the proposed mapping. The RMAP is also compared with the previously proposed reliability improvement methods, e.g., flow-control and flood-based methods. Simulation results reveal that the RMAP improves the reliability of an unprotected NoC by about 20%, while its performance overhead is lower than the other methods.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114393719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-to-End Transfer Rate Adjustment Mechanism for VANET","authors":"T. Ohta, Kazuki Ogasawara, Y. Kakuda","doi":"10.1109/DEPEND.2010.8","DOIUrl":"https://doi.org/10.1109/DEPEND.2010.8","url":null,"abstract":"In Vehicular Ad Hoc Network (VANET), the route between a source node and a destination node is intermittently broken. It is difficult to provide the reliable data transfer based on TCP because TCP is not adaptable to such networks such as VANET. Therefore, this paper proposes an end-to-end transfer rate adjustment mechanism in the application layer for VANET. The proposed mechanism consists of three functions: the transfer rate adjustment, the retransmission control, and the user priority decision. In order to realize the proposed mechanism, we introduce the improved AODV as a routing layer and UDP as a transport layer. Then, we confirmed that the proposed three functions work well in the conditions that the route breaks occur intermittently through simulation experiments. As a result, it is shown that the proposed mechanism is effective for VANET because of three functions with respect to the intermittent disconnection.","PeriodicalId":447746,"journal":{"name":"2010 Third International Conference on Dependability","volume":"44 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120870859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}