A Parallel And Scalable Multi-FPGA based Architecture for High Performance Applications (Abstract Only)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI:10.1145/2684746.2689115

V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache

{"title":"A Parallel And Scalable Multi-FPGA based Architecture for High Performance Applications (Abstract Only)","authors":"V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache","doi":"10.1145/2684746.2689115","DOIUrl":null,"url":null,"abstract":"Several industrial applications are becoming highly sophisticated and distributed as they capture and process real-time data from several sources at the same time. Furthermore, availability of acquisition channels such as I/O interfaces per FPGA, also dictates how applications are partitioned over several devices. Thus computationally intensive, resource consuming functions are implemented on multiple hardware accelerators, making low-latency communication to be a crucial factor. In such applications, communication between multiple devices means using high-speed point-to-point protocols with little flexibility in terms of communication scalability. The problem with the current systems is that, they are usually built to meet the needs of a specific application, i.e., lacks flexibility to change the communication topology or upgrade hardware resources. This leads to obsolescence, hardware redesign cost, and also wastes computing power. Taking this into consideration, we propose a scalable, modular and customizable computing platform, with a parallel full-duplex communication network, that redefines the computation and communication paradigm in such applications. We have implemented a scalable distributed secure H.264 encoding application with 3 channels over 3 customizable FPGA modules. In a distributed architecture, the inter-FPGA communication time is almost completely overshadowed by the overall execution time for bigger data-sets, and is comparable to the overall execution time of a non-distributed architecture, for the same implementation scaled down to 1 channel for 1 FPGA. This makes our architecture highly scalable and suitable for high-performance streaming applications. With 3 detachable FPGA modules, each sending and receive data simultaneously at 3 GB/s each, we measured the total net unidirectional traffic at any given time in the system is 9 GB/s, making the total net bidirectional bandwidth for 6 modules to be 36 GB/s.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2684746.2689115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Several industrial applications are becoming highly sophisticated and distributed as they capture and process real-time data from several sources at the same time. Furthermore, availability of acquisition channels such as I/O interfaces per FPGA, also dictates how applications are partitioned over several devices. Thus computationally intensive, resource consuming functions are implemented on multiple hardware accelerators, making low-latency communication to be a crucial factor. In such applications, communication between multiple devices means using high-speed point-to-point protocols with little flexibility in terms of communication scalability. The problem with the current systems is that, they are usually built to meet the needs of a specific application, i.e., lacks flexibility to change the communication topology or upgrade hardware resources. This leads to obsolescence, hardware redesign cost, and also wastes computing power. Taking this into consideration, we propose a scalable, modular and customizable computing platform, with a parallel full-duplex communication network, that redefines the computation and communication paradigm in such applications. We have implemented a scalable distributed secure H.264 encoding application with 3 channels over 3 customizable FPGA modules. In a distributed architecture, the inter-FPGA communication time is almost completely overshadowed by the overall execution time for bigger data-sets, and is comparable to the overall execution time of a non-distributed architecture, for the same implementation scaled down to 1 channel for 1 FPGA. This makes our architecture highly scalable and suitable for high-performance streaming applications. With 3 detachable FPGA modules, each sending and receive data simultaneously at 3 GB/s each, we measured the total net unidirectional traffic at any given time in the system is 9 GB/s, making the total net bidirectional bandwidth for 6 modules to be 36 GB/s.

查看原文本刊更多论文

面向高性能应用的并行可扩展多fpga架构(仅摘要)

一些工业应用正在变得高度复杂和分布式，因为它们同时从多个来源捕获和处理实时数据。此外，获取通道(如每个FPGA的I/O接口)的可用性也决定了应用程序如何在多个设备上进行分区。因此，计算密集型、消耗资源的功能是在多个硬件加速器上实现的，这使得低延迟通信成为一个关键因素。在这样的应用程序中，多个设备之间的通信意味着使用高速点对点协议，在通信可伸缩性方面灵活性很小。当前系统的问题在于，它们通常是为满足特定应用程序的需要而构建的，即缺乏更改通信拓扑或升级硬件资源的灵活性。这会导致过时，硬件重新设计成本，也浪费了计算能力。考虑到这一点，我们提出了一个可扩展，模块化和可定制的计算平台，具有并行的全双工通信网络，重新定义了此类应用中的计算和通信范式。我们已经实现了一个可扩展的分布式安全H.264编码应用程序，在3个可定制的FPGA模块上有3个通道。在分布式架构中，FPGA之间的通信时间几乎完全被更大数据集的总体执行时间所掩盖，并且与非分布式架构的总体执行时间相当，对于相同的实现，将其缩小到1个FPGA的1个通道。这使得我们的架构具有高度可扩展性，适合高性能流媒体应用。使用3个可拆卸的FPGA模块，每个模块同时以3gb /s的速度发送和接收数据，我们测量了系统在任何给定时间的总网络单向流量为9gb /s，使6个模块的总网络双向带宽为36gb /s。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量