Throughput-oriented kernel porting onto FPGAs

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2013-05-29 DOI:10.1145/2463209.2488747

Alexandros Papakonstantinou, Deming Chen, Wen-mei W. Hwu, J. Cong, Yun Liang

引用次数: 9

Abstract

Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FPGAs.

查看原文本刊更多论文

面向吞吐量的内核移植到fpga上

可重构器件由于其低功耗和并行处理的优点，常用于异构系统。一个重要的可用性要求是支持同构编程接口。然而，同构编程接口并不能消除对代码调整的需要，从而实现跨异构体系结构计算的有效映射。在这项工作中，我们提出了一个代码优化框架，该框架分析和重构了针对GPU设备优化的CUDA内核，以促进fpga上高通量定制加速器的合成。所提出的框架支持高效的性能移植，而无需用户手动代码调整或注释。为了在fpga上实现高吞吐量的执行，采用了层次化区域图，并结合代码运动和数组变量的图形着色来重构内核。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量