HLScope: High-Level Performance Debugging for FPGA Designs

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI:10.1109/FCCM.2017.44

Young-kyu Choi, J. Cong

{"title":"HLScope: High-Level Performance Debugging for FPGA Designs","authors":"Young-kyu Choi, J. Cong","doi":"10.1109/FCCM.2017.44","DOIUrl":null,"url":null,"abstract":"In their quest for further optimization, field-programmable gate array (FPGA) designers often spend considerable time trying to identify the performance bottleneck in a current design. But since FPGAs do not have built-in high-level probes for performance analysis, manual effort is required to insert custom hardware monitors. This, however, is a time-consuming process which calls for automation. Previous work automates the process of inserting hardware monitors into the communication channels or the finite-state machine, but the instrumentation is applied in low-level hardware description languages (HDL) which limits the comprehensibility in identifying the root cause of stalls. Instead, we propose a performance debugging methodology based on high-level synthesis (HLS). High-level analysis allows tracing the cause of stalls on a function or loop level, which provides a more intuitive feedback that can be used to pinpoint the performance bottleneck. In this paper we propose HLScope, a source-to-source transformation framework based on Vivado HLS for automated performance analysis. We present a method for analyzing the information collected from the software simulation to estimate the stall rate and its cause without the need for FPGA bitstream generation. For detailed analysis, an in-FPGA analysis method is proposed that can be natively integrated into the HLS environment. Experiments show that the parameter extraction from the simulation process is orders of magnitude faster than bitstream generation, with a 2.2% cycle difference on average. In-FPGA flow consumes only about 170 LUTs and a BRAM per monitored module and provides cycle-accurate results.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

In their quest for further optimization, field-programmable gate array (FPGA) designers often spend considerable time trying to identify the performance bottleneck in a current design. But since FPGAs do not have built-in high-level probes for performance analysis, manual effort is required to insert custom hardware monitors. This, however, is a time-consuming process which calls for automation. Previous work automates the process of inserting hardware monitors into the communication channels or the finite-state machine, but the instrumentation is applied in low-level hardware description languages (HDL) which limits the comprehensibility in identifying the root cause of stalls. Instead, we propose a performance debugging methodology based on high-level synthesis (HLS). High-level analysis allows tracing the cause of stalls on a function or loop level, which provides a more intuitive feedback that can be used to pinpoint the performance bottleneck. In this paper we propose HLScope, a source-to-source transformation framework based on Vivado HLS for automated performance analysis. We present a method for analyzing the information collected from the software simulation to estimate the stall rate and its cause without the need for FPGA bitstream generation. For detailed analysis, an in-FPGA analysis method is proposed that can be natively integrated into the HLS environment. Experiments show that the parameter extraction from the simulation process is orders of magnitude faster than bitstream generation, with a 2.2% cycle difference on average. In-FPGA flow consumes only about 170 LUTs and a BRAM per monitored module and provides cycle-accurate results.

查看原文本刊更多论文

HLScope: FPGA设计的高级性能调试

为了进一步优化，现场可编程门阵列(FPGA)设计人员经常花费大量时间试图确定当前设计中的性能瓶颈。但是，由于fpga没有内置用于性能分析的高级探针，因此需要手动插入自定义硬件监视器。然而，这是一个耗时的过程，需要自动化。以前的工作自动化了将硬件监视器插入通信通道或有限状态机的过程，但仪器仪表是在低级硬件描述语言(HDL)中应用的，这限制了识别停机根本原因的可理解性。相反，我们提出了一种基于高级综合(HLS)的性能调试方法。高级分析允许在函数或循环级别上跟踪停滞的原因，从而提供更直观的反馈，可用于查明性能瓶颈。在本文中，我们提出了HLScope，一个基于Vivado HLS的用于自动性能分析的源到源转换框架。我们提出了一种方法来分析从软件仿真中收集的信息，以估计失速率及其原因，而无需FPGA比特流生成。为了进行详细分析，提出了一种可本地集成到HLS环境中的fpga内分析方法。实验表明，仿真过程中的参数提取比比特流生成快几个数量级，平均周期差为2.2%。fpga内流仅消耗大约170个lut和一个BRAM每个监控模块，并提供周期精确的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量