Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium)最新文献

Extending High-Level Synthesis for Task-Parallel Programs. 扩展任务并行程序的高级综合。

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2021-05-01 Epub Date: 2021-06-02 DOI: 10.1109/fccm51124.2021.00032

Yuze Chi, Licheng Guo, Jason Lau, Young-Kyu Choi, Jie Wang, Jason Cong

{"title":"Extending High-Level Synthesis for Task-Parallel Programs.","authors":"Yuze Chi, Licheng Guo, Jason Lau, Young-Kyu Choi, Jie Wang, Jason Cong","doi":"10.1109/fccm51124.2021.00032","DOIUrl":"https://doi.org/10.1109/fccm51124.2021.00032","url":null,"abstract":"<p><p>C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools do support task-parallel programs, the productivity is greatly limited ① in the code development cycle due to the poor programmability, ② in the correctness verification cycle due to restricted software simulation, and ③ in the QoR tuning cycle due to slow code generation. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, unconstrained software simulation, and fast hierarchical code generation to overcome these limitations and demonstrate how task-parallel programs can be productively supported in HLS. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly shortened by 3.2× and 6.8×, respectively. Our work is open-source at https://github.com/UCLA-VAST/tapa/.</p>","PeriodicalId":93352,"journal":{"name":"Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium)","volume":"2021 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/fccm51124.2021.00032","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39396430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations MixFX-SCORE:数据流计算的异构定点编译

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2013-12-15 DOI: 10.1109/.62

João Paiva, L. Rodrigues

{"title":"MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations","authors":"João Paiva, L. Rodrigues","doi":"10.1109/.62","DOIUrl":"https://doi.org/10.1109/.62","url":null,"abstract":"Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth selection of each variable in the computation. However, selecting the best precision for each variable is an NP-hard problem. While traditional solutions use automated heuristics like simulated annealing or integer linear programming, they still rely on the manual formulation of resource models, which can be tedious, and potentially inaccurate due to the unpredictable interactions between different stages of the FPGA CAD flow. We develop MixFX-SCORE, an automated tool-flow based on FX-SCORE fixed-point compilation framework and simulated annealing, to address this challenge. We outsource error analysis (Gappa++) and resource model generation (Vivado HLS, Logic Synthesis, Xilinx Place-and-Route) to external tools that offer a more accurate representation of error behavior (backed by proofs) and resource usage (based on actual utilization). We demonstrate 1.1 -- 3.5x LUTs count savings, 1 -- 1.8x DSP count reductions, and 1 -- 3.9x dynamic power improvements while still satisfying the accuracy constraints when compared to homogeneous fixed-point implementations.","PeriodicalId":93352,"journal":{"name":"Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium)","volume":"248 1","pages":"206-209"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91395544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Hardware MPI Spawn for Distributed Multiprocessing Reconfigurable System on Chip (MP-RSoC) 面向分布式多处理可重构片上系统(MP-RSoC)的硬件MPI衍生

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2013-12-15 DOI: 10.1109/FCCM.2014.73

R. C. G. N. Ewo, A. Pinna, B. Granado, M. Mbouenda, H. Fotsin

引用次数: 2

Integrated CUDA-to-FPGA Synthesis with Network-on-Chip 集成CUDA-to-FPGA合成与片上网络

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2009-07-20 DOI: 10.1109/.12

S. Gurumani, Jacob Tolar, Yao Chen, Yun Liang, K. Rupnow, Deming Chen

引用次数: 0

Improving Performance of Partial Reconfiguration Using Strategy of Virtual Deletion 利用虚拟删除策略提高部分重构性能

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2008-04-14 DOI: 10.1109/FCCM.2008.51

Tian Hangpei, Gao De-yuan, Wei Wu, Fan Xiao-ya, Zhu Yian

引用次数: 1

Enhancing Relocatability of Partial Bitstreams for Run-Time Reconfiguration 为运行时重构增强部分位流的可重定位性

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2007-04-23 DOI: 10.1109/FCCM.2007.51

Tobias Becker, Wayne Luk, Peter Y. K. Cheung

引用次数: 2

An FPGA implementation of pipelined multiplicative division with IEEE Rounding 基于IEEE舍入的流水线乘法除法的FPGA实现

Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium) Pub Date : 2007-04-23 DOI: 10.1109/FCCM.2007.59

Ronen Goldberg, Guy Even, P. Seidel

引用次数: 1