Reconfigurations for Processor Arrays with Faulty Switches and Links

W. Jigang, Longting Zhu, Peilan He, Guiyuan Jiang
{"title":"Reconfigurations for Processor Arrays with Faulty Switches and Links","authors":"W. Jigang, Longting Zhu, Peilan He, Guiyuan Jiang","doi":"10.1109/CCGrid.2015.47","DOIUrl":null,"url":null,"abstract":"Large scale multiprocessor array suffers from frequent hardware defects or soft faults due to overheating, overload or occupancy by other running applications. To obtain fault-free logical array, reconfiguration techniques are proposed to reuse the fault-free PEs by changing the interconnection among PEs. Previous research has worked on this topic but assume that switches and links are fault-free. In this paper, we consider faults not only on the processing elements (PEs) but also on the switches and links, and develop efficient algorithms to construct as large as possible logical arrays with optimized networks length. To deal with the faults on switches and links, an efficient pre-processing procedure is designed, in which switch faults are transformed into link faults, and then faulty links are classified into several categories to handle. Then, we propose an efficient algorithm, A-MLA, to produce as many as possible logical columns which are then combined to form a two dimensional processor array. After that, we propose an algorithm A-TMLA to reduce the interconnection length of the logical array obtained by algorithm A-MLA, as short interconnect leads to small communication latency and power consumption. Extensive experimental results show that, even with switch faults and link faults, our approach can produce larger logical fault-free arrays with shorter interconnection length, compared to the state-of-the-art.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"86 1","pages":"141-148"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Large scale multiprocessor array suffers from frequent hardware defects or soft faults due to overheating, overload or occupancy by other running applications. To obtain fault-free logical array, reconfiguration techniques are proposed to reuse the fault-free PEs by changing the interconnection among PEs. Previous research has worked on this topic but assume that switches and links are fault-free. In this paper, we consider faults not only on the processing elements (PEs) but also on the switches and links, and develop efficient algorithms to construct as large as possible logical arrays with optimized networks length. To deal with the faults on switches and links, an efficient pre-processing procedure is designed, in which switch faults are transformed into link faults, and then faulty links are classified into several categories to handle. Then, we propose an efficient algorithm, A-MLA, to produce as many as possible logical columns which are then combined to form a two dimensional processor array. After that, we propose an algorithm A-TMLA to reduce the interconnection length of the logical array obtained by algorithm A-MLA, as short interconnect leads to small communication latency and power consumption. Extensive experimental results show that, even with switch faults and link faults, our approach can produce larger logical fault-free arrays with shorter interconnection length, compared to the state-of-the-art.
具有故障开关和链路的处理器阵列的重新配置
大型多处理器阵列由于过热、过载或被其他正在运行的应用程序占用而经常出现硬件缺陷或软故障。为了获得无故障逻辑阵列,提出了重构技术,通过改变pe之间的互连来重用无故障pe。先前的研究已经在这个主题上进行了工作,但假设交换机和链路是无故障的。在本文中,我们不仅考虑了处理单元的故障,而且考虑了交换机和链路的故障,并开发了有效的算法来构建具有优化网络长度的尽可能大的逻辑阵列。针对交换机和链路上的故障,设计了一种高效的预处理流程,将交换机故障转化为链路故障,然后将故障链路分类处理。然后,我们提出了一种高效的算法,a - mla,以产生尽可能多的逻辑列,然后组合成一个二维处理器阵列。之后,我们提出了一种算法A-TMLA,以减少算法A-MLA获得的逻辑阵列的互连长度,因为短的互连导致较小的通信延迟和功耗。广泛的实验结果表明,即使有开关故障和链路故障,与最先进的技术相比,我们的方法可以产生更大的逻辑无故障阵列,互连长度更短。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信