FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI:10.1109/HPEC43674.2020.9286200

Pouya Haghi, Anqi Guo, Qingqing Xiong, Rushi Patel, Chen Yang, Tong Geng, Justin T. Broaddus, Ryan J. Marshall, A. Skjellum, M. Herbordt

{"title":"FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives","authors":"Pouya Haghi, Anqi Guo, Qingqing Xiong, Rushi Patel, Chen Yang, Tong Geng, Justin T. Broaddus, Ryan J. Marshall, A. Skjellum, M. Herbordt","doi":"10.1109/HPEC43674.2020.9286200","DOIUrl":null,"url":null,"abstract":"MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself, rather than, e.g., the NIC. We have designed a hardware accelerator MPI-FPGA to implement six MPI collectives in the network. Preliminary results show that MPI-FPGA achieves on average 3.9× speedup over conventional clusters in the most likely scenarios. Essential to this work is providing support for sub-communicator collectives. We introduce a novel mechanism that enables the hardware to support a large number of communicators of arbitrary shape, and that is scalable to very large systems. We show how communicator support can be integrated easily into an in-switch hardware accelerator to implement MPI communicators and so enable full offload of MPI collectives. While this mechanism is universally applicable, we implement it in an FPGA cluster; FPGAs provide the ability to couple communication and computation and so are an ideal testbed and have a number of other architectural benefits. MPI-FPGA is fully integrated into MPICH and so transparently usable by MPI annlications.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself, rather than, e.g., the NIC. We have designed a hardware accelerator MPI-FPGA to implement six MPI collectives in the network. Preliminary results show that MPI-FPGA achieves on average 3.9× speedup over conventional clusters in the most likely scenarios. Essential to this work is providing support for sub-communicator collectives. We introduce a novel mechanism that enables the hardware to support a large number of communicators of arbitrary shape, and that is scalable to very large systems. We show how communicator support can be integrated easily into an in-switch hardware accelerator to implement MPI communicators and so enable full offload of MPI collectives. While this mechanism is universally applicable, we implement it in an FPGA cluster; FPGAs provide the ability to couple communication and computation and so are an ideal testbed and have a number of other architectural benefits. MPI-FPGA is fully integrated into MPICH and so transparently usable by MPI annlications.

查看原文本刊更多论文

网络中的fpga和新型通信器支持加速MPI集体

在高性能计算应用程序中，MPI集合操作通常是性能杀手;我们试图通过将它们卸载到交换机本身的可重构硬件来解决这个瓶颈，而不是，例如，NIC。我们设计了一个硬件加速器MPI- fpga来实现网络中的6个MPI集合。初步结果表明，MPI-FPGA在最可能的情况下比传统集群平均实现3.9倍的加速。这项工作的关键是为子传播者群体提供支持。我们引入了一种新的机制，使硬件能够支持任意形状的大量通信器，并且可以扩展到非常大的系统。我们展示了如何将通信器支持轻松集成到开关内硬件加速器中，以实现MPI通信器，从而实现MPI集合的完全卸载。虽然这种机制是普遍适用的，但我们在FPGA集群中实现它;fpga提供了耦合通信和计算的能力，因此是一个理想的测试平台，并具有许多其他架构优势。MPI- fpga完全集成到MPICH中，因此可以透明地使用MPI通信。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量