Extending High-Level Synthesis for Task-Parallel Programs.

Yuze Chi, Licheng Guo, Jason Lau, Young-Kyu Choi, Jie Wang, Jason Cong
{"title":"Extending High-Level Synthesis for Task-Parallel Programs.","authors":"Yuze Chi,&nbsp;Licheng Guo,&nbsp;Jason Lau,&nbsp;Young-Kyu Choi,&nbsp;Jie Wang,&nbsp;Jason Cong","doi":"10.1109/fccm51124.2021.00032","DOIUrl":null,"url":null,"abstract":"<p><p>C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools do support task-parallel programs, the productivity is greatly limited ① in the code development cycle due to the poor programmability, ② in the correctness verification cycle due to restricted software simulation, and ③ in the QoR tuning cycle due to slow code generation. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, unconstrained software simulation, and fast hierarchical code generation to overcome these limitations and demonstrate how task-parallel programs can be productively supported in HLS. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly shortened by 3.2× and 6.8×, respectively. Our work is open-source at https://github.com/UCLA-VAST/tapa/.</p>","PeriodicalId":93352,"journal":{"name":"Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium)","volume":"2021 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/fccm51124.2021.00032","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines. FCCM (Symposium)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/fccm51124.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/6/2 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools do support task-parallel programs, the productivity is greatly limited ① in the code development cycle due to the poor programmability, ② in the correctness verification cycle due to restricted software simulation, and ③ in the QoR tuning cycle due to slow code generation. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, unconstrained software simulation, and fast hierarchical code generation to overcome these limitations and demonstrate how task-parallel programs can be productively supported in HLS. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly shortened by 3.2× and 6.8×, respectively. Our work is open-source at https://github.com/UCLA-VAST/tapa/.

扩展任务并行程序的高级综合。
近年来,基于C/ c++ / opencl的高阶综合(high-level synthesis, HLS)方法由于其具有较好的结果质量(QoR)和较短的开发周期,与传统的寄存器-传输级设计方法相比,在FPGA (field-programmable gate array, FPGA)加速器中越来越受到广泛的应用。然而,由于受到顺序C语义的限制,在许多其他应用程序领域中采用相同的高效高级编程方法仍然具有挑战性,在这些领域中,粗粒度任务并行运行,并在细粒度级别上相互通信。虽然目前的HLS工具确实支持任务并行程序,但其生产力受到很大限制:①在代码开发周期中,由于可编程性差;②在正确性验证周期中,由于软件模拟受限;③在QoR调优周期中,由于代码生成缓慢。这种有限的生产效率通常会破坏HLS的目的,并阻碍程序员将HLS用于任务并行FPGA加速器。在本文中,我们扩展了HLS c++语言,并提出了一个完全自动化的框架,具有程序员友好的界面,不受约束的软件模拟和快速分层代码生成,以克服这些限制,并演示了如何在HLS中有效地支持任务并行程序。基于广泛的实际任务并行程序的实验结果表明,平均而言,内核和主机代码的行数分别减少了22%和51%,这大大提高了可编程性。正确性验证和迭代QoR调优周期分别大大缩短了3.2倍和6.8倍。我们的工作是开源的,网址是https://github.com/UCLA-VAST/tapa/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信