Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2012-02-22 DOI:10.1145/2145694.2145757

C. Y. Lin, N. Wong, Hayden Kwok-Hay So

{"title":"Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only)","authors":"C. Y. Lin, N. Wong, Hayden Kwok-Hay So","doi":"10.1145/2145694.2145757","DOIUrl":null,"url":null,"abstract":"Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"38 8","pages":"270"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2145694.2145757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.

查看原文本刊更多论文

fpga上高能效数据流计算的操作调度与架构协同合成(仅摘要)

编译在fpga上执行的高级用户应用程序通常涉及合成超出可用片上计算资源大小的数据流图。解决这个问题的一种方法是将给定数据流图的执行折叠到直接连接的简单可配置处理元素(cpe)数组中。在这种情况下，最终系统的性能和能效不仅取决于cpe上计算操作的映射调度，还取决于连接cpe的互连阵列的拓扑结构。本文提出了一个框架，在该框架中，操作调度程序和底层CPE互连网络拓扑在每个应用的基础上共同优化，以实现节能的FPGA计算。对于相同的应用程序，通过使用不同的通用规则阵列拓扑来连接cpe，可以实现2.5倍以上的能效差异。此外，通过使用源自遗传算法的不规则应用特定互连拓扑，与使用最好的规则拓扑相比，可以实现高达50%的能量延迟积改进。这种框架的使用预计将作为快速高级FPGA应用程序编译器的一部分，因为生成最佳调度和拓扑需要最少的硬件放置和路由。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

FPGA. ACM International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量