High-Level Description and Synthesis of Floating-Point Accumulators on FPGA

2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2013-04-28 DOI:10.1109/FCCM.2013.37

Marc-André Daigneault, J. David

{"title":"High-Level Description and Synthesis of Floating-Point Accumulators on FPGA","authors":"Marc-André Daigneault, J. David","doi":"10.1109/FCCM.2013.37","DOIUrl":null,"url":null,"abstract":"Decades of research in the field of high level hardware description now result in tools that are able to automatically transform C/C++ constructs into highly optimized parallel and pipelined architectures. Such approaches work fine when the control flow is a priory known since the computation results in a large dataflow graph that can be mapped into the available operators. Nevertheless, some applications have a control flow that is highly dependant on the data. This paper focuses on the hardware implementation of such applications and presents a high level synthesis methodology applied to a Hardware Description Language (HDL) in which assignments correspond to self-synchronized connections between predefined data streaming sources and sinks. A data transfer occurs over an established connection when both source and sink are ready, according to their synchronization interfaces. Founded on a high-level communicating FSM programming model, the language allows the user to describe and dynamically modify streaming architectures exploiting spatial and temporal parallelism. Our compiler attempts to maximize the number of transfers at each clock cycle and automatically fixes the potential combinatorial loops induced by the dynamic connection of dependant sources and sinks. The methodology is applied to the synthesis of a pipelined floating point accumulator using the Delayed-Buffering (DB) reduction method. The results we obtain are similar to state-of-the-art dedicated architectures but require much less design time and expertise.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2013.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Decades of research in the field of high level hardware description now result in tools that are able to automatically transform C/C++ constructs into highly optimized parallel and pipelined architectures. Such approaches work fine when the control flow is a priory known since the computation results in a large dataflow graph that can be mapped into the available operators. Nevertheless, some applications have a control flow that is highly dependant on the data. This paper focuses on the hardware implementation of such applications and presents a high level synthesis methodology applied to a Hardware Description Language (HDL) in which assignments correspond to self-synchronized connections between predefined data streaming sources and sinks. A data transfer occurs over an established connection when both source and sink are ready, according to their synchronization interfaces. Founded on a high-level communicating FSM programming model, the language allows the user to describe and dynamically modify streaming architectures exploiting spatial and temporal parallelism. Our compiler attempts to maximize the number of transfers at each clock cycle and automatically fixes the potential combinatorial loops induced by the dynamic connection of dependant sources and sinks. The methodology is applied to the synthesis of a pipelined floating point accumulator using the Delayed-Buffering (DB) reduction method. The results we obtain are similar to state-of-the-art dedicated architectures but require much less design time and expertise.

查看原文本刊更多论文

FPGA上浮点累加器的高级描述与综合

在高级硬件描述领域几十年的研究现在产生了能够自动将C/ c++结构转换为高度优化的并行和流水线体系结构的工具。当控制流是已知的优先级时，这种方法可以很好地工作，因为计算结果是一个可以映射到可用操作符的大型数据流图。然而，一些应用程序具有高度依赖于数据的控制流。本文着重于此类应用的硬件实现，并提出了一种应用于硬件描述语言(HDL)的高级综合方法，其中分配对应于预定义数据流源和汇之间的自同步连接。当源和接收都准备好时，根据它们的同步接口，在已建立的连接上进行数据传输。该语言基于高级通信FSM编程模型，允许用户描述和动态修改利用空间和时间并行性的流架构。我们的编译器试图最大化每个时钟周期的传输数量，并自动修复由依赖源和汇的动态连接引起的潜在组合环路。将该方法应用于使用延迟缓冲(DB)缩减方法的流水线浮点累加器的合成。我们得到的结果类似于最先进的专用架构，但需要更少的设计时间和专业知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines

自引率

0.00%

发文量