K. Tilly, István Kiss, G. Román, T. Dobrowiecki, A. Várkonyi-Kóczy
{"title":"A method for the construction and interpretation of high level models for distributed fault-tolerant systems","authors":"K. Tilly, István Kiss, G. Román, T. Dobrowiecki, A. Várkonyi-Kóczy","doi":"10.1109/RELDIS.1995.526215","DOIUrl":null,"url":null,"abstract":"Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, by using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. The elements and the structure of the proposed system modelling method are presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 14th Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RELDIS.1995.526215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, by using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. The elements and the structure of the proposed system modelling method are presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.