可扩展的低延迟fpga间网络

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00031

K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi

{"title":"可扩展的低延迟fpga间网络","authors":"K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi","doi":"10.1109/ipdps53621.2022.00031","DOIUrl":null,"url":null,"abstract":"A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"92 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Scalable Low-Latency Inter-FPGA Networks\",\"authors\":\"K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi\",\"doi\":\"10.1109/ipdps53621.2022.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"92 12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

先进的FPGA卡可以通过高密度光集成(例如板载硅光子收发器)配备许多高带宽I/ o，为存储器到存储器的FPGA间通信提供高网络带宽。本研究通过利用由两条单跳路径组成的间接路径来实现直径为2的网络拓扑结构，展示了其可扩展的无交换机网络架构。然后，它采用一个直径为2的Kautz网络拓扑，用于连接d(d + 1)个程度为d的fpga，这接近理论上限。Kautz网络拓扑结构有双向连接和单向连接，它们形成三角形。单向链路给避免通道缓冲区溢出带来了困难，因为现有的链路级流量控制假定是双向链路。本研究提出了一个间接流动控制沿单向三角形嵌入在Kautz网络拓扑。然后，它开发单播组合，形成多端口集体通信，以减轻启动延迟对执行时间的影响。由于高度FPGA卡在1- U计算服务器的面板上存储许多I/O端口存在困难，因此我们建议使用WDM(波分复用)作为替代方案，并将其有效映射到阵列波导光栅(AWG)上。所需的波长数在d+ 1 AWG设备上变为d。基于我们在定制的Stratix10 FPGA卡OPTWEB上的实验结果，SimGrid仿真结果表明，我们的集体通信速度比使用272个FPGA的Dragonfly快7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Low-Latency Inter-FPGA Networks

A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量