{"title":"Exploiting redundancy to speed up parallel systems","authors":"I. Yen, E. Leiss, F. Bastani","doi":"10.1109/88.242445","DOIUrl":null,"url":null,"abstract":"Repetitive fault tolerance takes advantage of redundant processors to offer peak performance during normal execution, and graceful performance degradation when processors fail. As long as one processor is working, the computation can continue. The authors use the underlying principle of inherent fault tolerance, turning redundancy into computation power, to design a model of repetitive fault tolerance that is suitable for dataflow computations. When no processors fail, they all work in parallel to achieve performance almost equal to that of the parallel program without fault tolerance. If processors do fail, the program can still derive the correct result as long as at least one processor is working; failures only slow the computation speed. Repetitive fault tolerance also provides a systematic way to derive fault-tolerant programs.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Parallel & Distributed Technology: Systems & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/88.242445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Repetitive fault tolerance takes advantage of redundant processors to offer peak performance during normal execution, and graceful performance degradation when processors fail. As long as one processor is working, the computation can continue. The authors use the underlying principle of inherent fault tolerance, turning redundancy into computation power, to design a model of repetitive fault tolerance that is suitable for dataflow computations. When no processors fail, they all work in parallel to achieve performance almost equal to that of the parallel program without fault tolerance. If processors do fail, the program can still derive the correct result as long as at least one processor is working; failures only slow the computation speed. Repetitive fault tolerance also provides a systematic way to derive fault-tolerant programs.<>