{"title":"On the fly estimation of the processes that are alive/crashed in an asynchronous message-passing system","authors":"A. Mostéfaoui, M. Raynal, Gilles Trédan","doi":"10.1109/PRDC.2006.48","DOIUrl":null,"url":null,"abstract":"It is well-known that, in an asynchronous system where processes are prone to crash, it is impossible to design a protocol that provides each process with the set of processes that are currently alive. Basically, this comes from the fact that it is impossible to distinguish a crashed process from a process that is very slow or with which communications are very slow. Nevertheless, designing protocols that provide the processes with good approximations of the set of processes that are currently alive remains a real challenge in fault-tolerant distributed computing. This paper proposes such a protocol. To that end, it considers a realistic computation model where the processes are provided with non-synchronized local clocks and a function alpha(). That function takes a local duration as a parameter, and returns an integer that is an estimate of the number of processes that can crash during that duration. A simulation-based experimental evaluation of the protocol is also presented. The experiments show that the protocol is practically relevant","PeriodicalId":314915,"journal":{"name":"2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRDC.2006.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
It is well-known that, in an asynchronous system where processes are prone to crash, it is impossible to design a protocol that provides each process with the set of processes that are currently alive. Basically, this comes from the fact that it is impossible to distinguish a crashed process from a process that is very slow or with which communications are very slow. Nevertheless, designing protocols that provide the processes with good approximations of the set of processes that are currently alive remains a real challenge in fault-tolerant distributed computing. This paper proposes such a protocol. To that end, it considers a realistic computation model where the processes are provided with non-synchronized local clocks and a function alpha(). That function takes a local duration as a parameter, and returns an integer that is an estimate of the number of processes that can crash during that duration. A simulation-based experimental evaluation of the protocol is also presented. The experiments show that the protocol is practically relevant