Dynamic Inter-Block Scheduling for HLS

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00045

Jianyi Cheng, Lana Josipović, G. Constantinides, John Wickerson

{"title":"Dynamic Inter-Block Scheduling for HLS","authors":"Jianyi Cheng, Lana Josipović, G. Constantinides, John Wickerson","doi":"10.1109/FPL57034.2022.00045","DOIUrl":null,"url":null,"abstract":"A recent theme in HLS research is the production of dynamically scheduled circuits, which are made up of components that use handshaking to schedule themselves at run time, as opposed to following a schedule determined statically at compile time. Dynamically scheduled circuits promise superior performance on ‘irregular’ source programs, such as those whose control flow depends on input data, at the cost of additional area. Current dynamic scheduling techniques are well able to exploit parallelism among instructions within each basic block (BB) of the source program, but parallelism between BBs is underexplored. Although current tools allow the operations of different BBs to overlap, they require the BBs to start in strict program order, thus limiting the achievable parallelism and overall performance. We seek to lift this restriction. Doing so involves developing a toolflow that tackles the following challenges: (1) finding consecutive subgraphs in the control-flow graph and using static analysis to identify those subgraphs that can be safely parallelised, and (2) adapting the circuit so that those subgraphs are executed in parallel while ensuring deterministic circuit behaviour and correct usage of memory interfaces. Using two benchmark sets from related works, we compare our proposed toolflow against a state-of-the-art dynamically scheduled HLS tool called Dynamatic. Our results show that after standard loop unrolling is applied, our toolflow yields a 4 x average speedup, with a negligible area overhead. This increases to a 7.3 x average speedup when our toolflow is further combined with C-slow pipelining.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

A recent theme in HLS research is the production of dynamically scheduled circuits, which are made up of components that use handshaking to schedule themselves at run time, as opposed to following a schedule determined statically at compile time. Dynamically scheduled circuits promise superior performance on ‘irregular’ source programs, such as those whose control flow depends on input data, at the cost of additional area. Current dynamic scheduling techniques are well able to exploit parallelism among instructions within each basic block (BB) of the source program, but parallelism between BBs is underexplored. Although current tools allow the operations of different BBs to overlap, they require the BBs to start in strict program order, thus limiting the achievable parallelism and overall performance. We seek to lift this restriction. Doing so involves developing a toolflow that tackles the following challenges: (1) finding consecutive subgraphs in the control-flow graph and using static analysis to identify those subgraphs that can be safely parallelised, and (2) adapting the circuit so that those subgraphs are executed in parallel while ensuring deterministic circuit behaviour and correct usage of memory interfaces. Using two benchmark sets from related works, we compare our proposed toolflow against a state-of-the-art dynamically scheduled HLS tool called Dynamatic. Our results show that after standard loop unrolling is applied, our toolflow yields a 4 x average speedup, with a negligible area overhead. This increases to a 7.3 x average speedup when our toolflow is further combined with C-slow pipelining.

查看原文本刊更多论文

HLS的动态块间调度

HLS研究中最近的一个主题是动态调度电路的生产，它由在运行时使用握手来调度自己的组件组成，而不是遵循在编译时静态确定的调度。动态调度电路承诺在“不规则”源程序(例如那些控制流依赖于输入数据的程序)上具有优越的性能，但要以额外的面积为代价。当前的动态调度技术能够很好地利用源程序中每个基本块(BB)指令之间的并行性，但BB之间的并行性尚未得到充分的研究。虽然目前的工具允许不同的BBs操作重叠，但它们要求BBs以严格的程序顺序开始，从而限制了可实现的并行性和整体性能。我们寻求解除这一限制。这样做需要开发一个工具流来解决以下挑战:(1)在控制流图中找到连续的子图，并使用静态分析来识别那些可以安全并行化的子图，以及(2)调整电路，以便这些子图并行执行，同时确保确定性电路行为和正确使用内存接口。使用来自相关工作的两个基准集，我们将建议的工具流与最先进的动态调度HLS工具Dynamatic进行比较。我们的结果表明，在应用标准循环展开后，我们的工具流产生了4倍的平均加速，而面积开销可以忽略不计。当我们的工具流与C-slow的流水线进一步结合时，平均加速将增加到7.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量