Source-to-source translation: Impact on the performance of high level synthesis

Meena Belwal, Sudarshan TSB
{"title":"Source-to-source translation: Impact on the performance of high level synthesis","authors":"Meena Belwal, Sudarshan TSB","doi":"10.1109/CCAA.2017.8229944","DOIUrl":null,"url":null,"abstract":"The recent advancement in software industry such as Microsoft utilizing FPGAs (Field Programmable Gate Arrays) for acceleration in its search engine Bing and Intel's initiative to have its CPU along with Altera FPGA in the same chip indicates FPGA's potential as well as growing demand in the field of high performance computing. FPGAs provide accelerated computation due to their flexible architecture. However it creates challenges for the system designer as efficient design in terms of latency, power and energy demands hardware programming expertise. Hardware coding is a time consuming as well as an error prone task. High Level Synthesis (HLS) addresses these challenges by enabling programmer to code in High-level languages (HLL) such as C, C++, SystemC, CUDA and translating this code to hardware language such as Verilog or VHDL. Even though HLS tools provide several optimizations, their performance is limited due to the implementation constraints. Some of the software constructs widely used in high level language such as dynamic memory allocation, pointer-based data structures and recursion are very hard to implement well in hardware and thereby restricting the performance of HLS. Source-to-source translation is a mechanism to optimize the code in HLL so that the compiler can perform better in terms of code optimization. This article investigates whether the source-to-source translation widely used in HLL can also benefit high level synthesis. For this study, Bones source-to-source compiler is selected to perform the translation of C code to C (Optimized-C) and OpenMP code. These three types of code: C, Optimized-C and OpenMP were synthesized in LegUP HLS for three benchmarks; the performance statistics were measured for all the nine cases and analysis was conducted in terms of speedup, area reduction, power and energy consumption. OpenMP code performed better as compared to original C code in terms of execution time (speedup range 1.86–3.49), area (gain range 1–6.55) and energy (gain range 1.86–3.55). However optimized-C code did not always perform better than the original C-code in terms of execution time (speedup range 0.27–3.08), area (gain range 0.83–5.7) and energy (gain range 0.27–3.13). The power statistics observed were almost the same for all the three input versions of the code.","PeriodicalId":6627,"journal":{"name":"2017 International Conference on Computing, Communication and Automation (ICCCA)","volume":"69 1","pages":"951-956"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computing, Communication and Automation (ICCCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAA.2017.8229944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The recent advancement in software industry such as Microsoft utilizing FPGAs (Field Programmable Gate Arrays) for acceleration in its search engine Bing and Intel's initiative to have its CPU along with Altera FPGA in the same chip indicates FPGA's potential as well as growing demand in the field of high performance computing. FPGAs provide accelerated computation due to their flexible architecture. However it creates challenges for the system designer as efficient design in terms of latency, power and energy demands hardware programming expertise. Hardware coding is a time consuming as well as an error prone task. High Level Synthesis (HLS) addresses these challenges by enabling programmer to code in High-level languages (HLL) such as C, C++, SystemC, CUDA and translating this code to hardware language such as Verilog or VHDL. Even though HLS tools provide several optimizations, their performance is limited due to the implementation constraints. Some of the software constructs widely used in high level language such as dynamic memory allocation, pointer-based data structures and recursion are very hard to implement well in hardware and thereby restricting the performance of HLS. Source-to-source translation is a mechanism to optimize the code in HLL so that the compiler can perform better in terms of code optimization. This article investigates whether the source-to-source translation widely used in HLL can also benefit high level synthesis. For this study, Bones source-to-source compiler is selected to perform the translation of C code to C (Optimized-C) and OpenMP code. These three types of code: C, Optimized-C and OpenMP were synthesized in LegUP HLS for three benchmarks; the performance statistics were measured for all the nine cases and analysis was conducted in terms of speedup, area reduction, power and energy consumption. OpenMP code performed better as compared to original C code in terms of execution time (speedup range 1.86–3.49), area (gain range 1–6.55) and energy (gain range 1.86–3.55). However optimized-C code did not always perform better than the original C-code in terms of execution time (speedup range 0.27–3.08), area (gain range 0.83–5.7) and energy (gain range 0.27–3.13). The power statistics observed were almost the same for all the three input versions of the code.
源到源转换:对高级合成性能的影响
最近软件行业的进步,如微软利用FPGA(现场可编程门阵列)在其搜索引擎Bing中加速,以及英特尔将其CPU与Altera FPGA放在同一芯片中的计划,表明FPGA的潜力以及高性能计算领域不断增长的需求。fpga由于其灵活的结构提供了加速计算。然而,它给系统设计人员带来了挑战,因为在延迟、功率和能源方面,高效设计需要硬件编程专业知识。硬件编码是一项耗时且容易出错的任务。高级综合(HLS)通过使程序员能够使用C, c++, SystemC, CUDA等高级语言(HLL)进行编码,并将这些代码翻译为Verilog或VHDL等硬件语言,从而解决了这些挑战。尽管HLS工具提供了几种优化,但由于实现约束,它们的性能受到限制。一些在高级语言中广泛使用的软件结构,如动态内存分配、基于指针的数据结构和递归,很难在硬件上很好地实现,从而限制了HLS的性能。源到源转换是一种在HLL中优化代码的机制,这样编译器就可以在代码优化方面执行得更好。本文探讨了在HLL中广泛使用的源到源翻译是否也有利于高层次的综合。本研究选择Bones源到源编译器来执行C代码到C (Optimized-C)和OpenMP代码的翻译。在LegUP HLS中对C、Optimized-C和OpenMP三种代码进行了综合,并进行了三次基准测试;对所有9种情况进行性能统计,并从加速、面积减少、功耗和能耗方面进行分析。与原始C代码相比,OpenMP代码在执行时间(加速范围1.86-3.49)、面积(增益范围1-6.55)和能量(增益范围1.86-3.55)方面表现更好。然而,在执行时间(加速范围0.27-3.08)、面积(增益范围0.83-5.7)和能量(增益范围0.27-3.13)方面,优化后的c代码并不总是比原始c代码表现得更好。对于所有三个输入版本的代码,观察到的功率统计数据几乎相同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信