K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi
{"title":"可扩展的低延迟fpga间网络","authors":"K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi","doi":"10.1109/ipdps53621.2022.00031","DOIUrl":null,"url":null,"abstract":"A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"92 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Scalable Low-Latency Inter-FPGA Networks\",\"authors\":\"K. Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Y. Urino, M. Koibuchi\",\"doi\":\"10.1109/ipdps53621.2022.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"92 12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A cutting-edge FPGA card can be equipped with many high-bandwidth I/Os by means of high-density optical integration, e.g., onboard Si-photonics transceivers, to provide high network bandwidth for memory-to-memory inter-FPGA communication. This study presents its scalable switchless net-work architecture by exploiting an indirect path, consisting of two one-hop paths, for enabling a diameter-2 network topology. It then takes a Kautz network topology with a diameter of two for connecting d(d + 1) FPGAs with a degree of $d$, which is close to the theoretical upper bound. The Kautz network topologies have bi-directional links and uni-directional links which form triangles. Uni-directional links introduce difficulty in avoiding channel buffer overflow because the existing link-level flow control assumes a bi-directional link. This study presents an indirect flow control along a uni-directional triangle embedded in the Kautz network topology. It then develops a combination of unicasts that forms multi-port collective communications to mitigate the influence of the startup latency on the execution time. Since a high-degree FPGA card introduces difficulty in storing many I/O ports at the panel of a 1- U compute server, we propose using WDM (Wavelength Division Multiplexing) as an alternative and present its efficient mapping onto arrayed waveguide grating (AWG). The required number of wavelengths becomes d on d+ 1 AWG equipments. Based on our experimental results with OPTWEB of custom Stratix10 FPGA cards, SimGrid simulation results show that our collective communication is 7 × faster than that of Dragonfly with 272 FPGAs.