{"title":"WebCom元计算机的容错性","authors":"J. Morrison, James J. Kennedy, D. A. Power","doi":"10.1109/ICPPW.2001.951958","DOIUrl":null,"url":null,"abstract":"This paper addresses fault tolerance in the WebCom metacomputer. WebCom's computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the effect of a fault to its immediate neighbourhood.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"319 1","pages":"245-250"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fault tolerance in the WebCom metacomputer\",\"authors\":\"J. Morrison, James J. Kennedy, D. A. Power\",\"doi\":\"10.1109/ICPPW.2001.951958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses fault tolerance in the WebCom metacomputer. WebCom's computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the effect of a fault to its immediate neighbourhood.\",\"PeriodicalId\":93355,\"journal\":{\"name\":\"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops\",\"volume\":\"319 1\",\"pages\":\"245-250\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPPW.2001.951958\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2001.951958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper addresses fault tolerance in the WebCom metacomputer. WebCom's computation platform is dynamically reconfigurable and volunteer-based. Since its constituent machines may join and leave unpredictability, fault survival and efficient fault recovery is of paramount importance. A fault tolerance mechanism is outlined, which relies on a fast and efficient processor replacement procedure. It is shown that the characteristics of this procedure, together with the hierarchical and referentially transparent nature of WebCom executions, can be used to limit the effect of a fault to its immediate neighbourhood.