可扩展的低延迟fpga间网络

K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi
{"title":"可扩展的低延迟fpga间网络","authors":"K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi","doi":"10.1109/ipdps53621.2022.00031","DOIUrl":null,"url":null,"abstract":"A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"92 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Scalable Low-Latency Inter-FPGA Networks\",\"authors\":\"K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi\",\"doi\":\"10.1109/ipdps53621.2022.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"92 12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

先进的FPGA卡可以通过高密度光集成(例如板载硅光子收发器)配备许多高带宽I/ o,为存储器到存储器的FPGA间通信提供高网络带宽。本研究通过利用由两条单跳路径组成的间接路径来实现直径为2的网络拓扑结构,展示了其可扩展的无交换机网络架构。然后,它采用一个直径为2的Kautz网络拓扑,用于连接d(d + 1)个程度为d的fpga,这接近理论上限。Kautz网络拓扑结构有双向连接和单向连接,它们形成三角形。单向链路给避免通道缓冲区溢出带来了困难,因为现有的链路级流量控制假定是双向链路。本研究提出了一个间接流动控制沿单向三角形嵌入在Kautz网络拓扑。然后,它开发单播组合,形成多端口集体通信,以减轻启动延迟对执行时间的影响。由于高度FPGA卡在1- U计算服务器的面板上存储许多I/O端口存在困难,因此我们建议使用WDM(波分复用)作为替代方案,并将其有效映射到阵列波导光栅(AWG)上。所需的波长数在d+ 1 AWG设备上变为d。基于我们在定制的Stratix10 FPGA卡OPTWEB上的实验结果,SimGrid仿真结果表明,我们的集体通信速度比使用272个FPGA的Dragonfly快7倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scalable Low-Latency Inter-FPGA Networks
A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信