{"title":"Fault tolerance in hyperbus and hypercube multiprocessors using partitioning scheme","authors":"Szu-Chi Wang, S. Kuo","doi":"10.1109/ICPADS.1994.590319","DOIUrl":null,"url":null,"abstract":"In this paper, the partitioning scheme is used to achieve fault tolerance in hyperbus and hypercube multiprocessors. Unlike other schemes, processor faults are assumed to be randomly distributed. We propose a novel and practical load redistribution method to tolerate processor faults in a hyperbus structure with insignificant overhead (a slowdown of 2 for computation and a slowdown of 3 for communication in the worst case). Standard routing and broadcasting algorithms were implemented on hypercube computers. To achieve fault tolerance, we present routing and broadcasting algorithms for a faulty hypercube with at most n-1 faults. Compared with other existing algorithms, our methods have better performance in most measures.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.1994.590319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In this paper, the partitioning scheme is used to achieve fault tolerance in hyperbus and hypercube multiprocessors. Unlike other schemes, processor faults are assumed to be randomly distributed. We propose a novel and practical load redistribution method to tolerate processor faults in a hyperbus structure with insignificant overhead (a slowdown of 2 for computation and a slowdown of 3 for communication in the worst case). Standard routing and broadcasting algorithms were implemented on hypercube computers. To achieve fault tolerance, we present routing and broadcasting algorithms for a faulty hypercube with at most n-1 faults. Compared with other existing algorithms, our methods have better performance in most measures.