基于虫洞交换和回溯的容错消息交换

Manabu Sueishi, M. Kitakami, Hideo Ito
{"title":"基于虫洞交换和回溯的容错消息交换","authors":"Manabu Sueishi, M. Kitakami, Hideo Ito","doi":"10.1109/PRDC.2004.1276569","DOIUrl":null,"url":null,"abstract":"Parallel computers are now popularly applied to applications where many calculations are required. In a NO Remote memory Access model (NORA) parallel computer, many processors are connected by communication links and calculation results are obtained by communications among processors. The message switching method, which controls message transmission in the parallel computer, is one of the most important parameters to improve the performance of the parallel computer. Since parallel computers include many processors, its failure rate is very high and many fault-tolerant switching methods have been proposed. The existing methods have problems, however, such as low communication throughput, low fault-tolerant capability, and large hardware overhead. We propose fault-tolerant switching by improving wormhole switching. The proposed method inserts dummy flits, having no information, after the header flit, the first flit of the packet. By overwriting the header flit to the dummy flit, backtracking is implemented without hardware overhead. Computer simulation says that in a 16 by 16 2D torus, for example, the throughput of the proposed method is almost equal to that of existing methods which require large hardware overhead if the number of the faulty nodes is less then 40.","PeriodicalId":383639,"journal":{"name":"10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fault-tolerant message switching based on wormhole switching and backtracking\",\"authors\":\"Manabu Sueishi, M. Kitakami, Hideo Ito\",\"doi\":\"10.1109/PRDC.2004.1276569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel computers are now popularly applied to applications where many calculations are required. In a NO Remote memory Access model (NORA) parallel computer, many processors are connected by communication links and calculation results are obtained by communications among processors. The message switching method, which controls message transmission in the parallel computer, is one of the most important parameters to improve the performance of the parallel computer. Since parallel computers include many processors, its failure rate is very high and many fault-tolerant switching methods have been proposed. The existing methods have problems, however, such as low communication throughput, low fault-tolerant capability, and large hardware overhead. We propose fault-tolerant switching by improving wormhole switching. The proposed method inserts dummy flits, having no information, after the header flit, the first flit of the packet. By overwriting the header flit to the dummy flit, backtracking is implemented without hardware overhead. Computer simulation says that in a 16 by 16 2D torus, for example, the throughput of the proposed method is almost equal to that of existing methods which require large hardware overhead if the number of the faulty nodes is less then 40.\",\"PeriodicalId\":383639,\"journal\":{\"name\":\"10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRDC.2004.1276569\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRDC.2004.1276569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

并行计算机现在广泛应用于需要大量计算的应用中。在无远程存储器访问模型(NORA)并行计算机中,多个处理器之间通过通信链路连接,通过处理器之间的通信获得计算结果。消息交换方法控制并行计算机中的消息传输,是提高并行计算机性能的重要参数之一。由于并行计算机包含许多处理器,故障率很高,因此提出了许多容错切换方法。但是,现有的方法存在通信吞吐量低、容错能力差、硬件开销大等问题。我们通过改进虫洞交换提出了容错交换。所提出的方法在数据包的头字节(第一个字节)之后插入不包含任何信息的虚拟字节。通过将报头flit覆盖到虚拟flit,可以实现回溯,而无需硬件开销。计算机仿真表明,以一个16 × 16的二维环面为例,当故障节点数小于40时,所提出方法的吞吐量与现有需要大量硬件开销的方法的吞吐量几乎相等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fault-tolerant message switching based on wormhole switching and backtracking
Parallel computers are now popularly applied to applications where many calculations are required. In a NO Remote memory Access model (NORA) parallel computer, many processors are connected by communication links and calculation results are obtained by communications among processors. The message switching method, which controls message transmission in the parallel computer, is one of the most important parameters to improve the performance of the parallel computer. Since parallel computers include many processors, its failure rate is very high and many fault-tolerant switching methods have been proposed. The existing methods have problems, however, such as low communication throughput, low fault-tolerant capability, and large hardware overhead. We propose fault-tolerant switching by improving wormhole switching. The proposed method inserts dummy flits, having no information, after the header flit, the first flit of the packet. By overwriting the header flit to the dummy flit, backtracking is implemented without hardware overhead. Computer simulation says that in a 16 by 16 2D torus, for example, the throughput of the proposed method is almost equal to that of existing methods which require large hardware overhead if the number of the faulty nodes is less then 40.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信