2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献

筛选
英文 中文
Dynamic Inter-Block Scheduling for HLS HLS的动态块间调度
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00045
Jianyi Cheng, Lana Josipović, G. Constantinides, John Wickerson
{"title":"Dynamic Inter-Block Scheduling for HLS","authors":"Jianyi Cheng, Lana Josipović, G. Constantinides, John Wickerson","doi":"10.1109/FPL57034.2022.00045","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00045","url":null,"abstract":"A recent theme in HLS research is the production of dynamically scheduled circuits, which are made up of components that use handshaking to schedule themselves at run time, as opposed to following a schedule determined statically at compile time. Dynamically scheduled circuits promise superior performance on ‘irregular’ source programs, such as those whose control flow depends on input data, at the cost of additional area. Current dynamic scheduling techniques are well able to exploit parallelism among instructions within each basic block (BB) of the source program, but parallelism between BBs is underexplored. Although current tools allow the operations of different BBs to overlap, they require the BBs to start in strict program order, thus limiting the achievable parallelism and overall performance. We seek to lift this restriction. Doing so involves developing a toolflow that tackles the following challenges: (1) finding consecutive subgraphs in the control-flow graph and using static analysis to identify those subgraphs that can be safely parallelised, and (2) adapting the circuit so that those subgraphs are executed in parallel while ensuring deterministic circuit behaviour and correct usage of memory interfaces. Using two benchmark sets from related works, we compare our proposed toolflow against a state-of-the-art dynamically scheduled HLS tool called Dynamatic. Our results show that after standard loop unrolling is applied, our toolflow yields a 4 x average speedup, with a negligible area overhead. This increases to a 7.3 x average speedup when our toolflow is further combined with C-slow pipelining.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reduction of Bitstream Size for Low-Cost iCE40 FPGAs 降低低成本iCE40 fpga的比特流大小
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00028
Clemens Fritzsch, Jörn Hoffmann, Martin Bogdan
{"title":"Reduction of Bitstream Size for Low-Cost iCE40 FPGAs","authors":"Clemens Fritzsch, Jörn Hoffmann, Martin Bogdan","doi":"10.1109/FPL57034.2022.00028","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00028","url":null,"abstract":"Reducing the bitstream size is important to lower external storage requirements and to speed-up the reconfiguration of field-programmable gate arrays (FPGAs). The most common methods for bitstream size reduction are based on dedicated hardware elements or dynamic partial reconfiguration. All of these properties are usually missing in low-cost FPGAs such as the Lattice iCE40 device family. In this paper we propose a lightweight compaction approach for iCE40 FPGAs. We present five methods for bitstream compaction: two adapted and three new. The methods work directly on the bitstream by removing unnecessary data and redundant commands. They are applicable independent of the synthesis toolchain and require neither repetition of synthesis steps nor modifications of the target system. Although our focus is on iCE40 devices, we additionally discuss the conditions for applying our approach to other targets. All five methods were implemented in an open-source compaction tool. We evaluate our approach with an iCE40 HX8K FPGA by synthesizing and compacting various projects. As a result, we achieve a reduction in bitstream size and reconfiguration time by up to 79 %.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122396161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Binding and Port Assignment for Loop Pipelining in High-Level Synthesis 高级合成中环路管道的最优绑定和端口分配
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00047
Nicolai Fiege, Patrick Sittel, P. Zipf
{"title":"Optimal Binding and Port Assignment for Loop Pipelining in High-Level Synthesis","authors":"Nicolai Fiege, Patrick Sittel, P. Zipf","doi":"10.1109/FPL57034.2022.00047","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00047","url":null,"abstract":"In order to provide high throughput for custom hardware implementations, academic and commercial high-level synthesis (HLS) tools use loop pipelining by modulo scheduling. When provided a resource allocation and a schedule, the binding algorithm can be used to reduce the number of required lifetime registers (LR) and multiplexers (MUX). Contrary to non-modulo schedules, optimal solutions to the binding problem for implementing modulo schedules with respect to minimizing required LRs and MUXs have not been published. To address this topic, we propose a novel optimal binding algorithm to simultaneously minimize MUX and LR costs for loop pipelining using Integer Linear Programming. We evaluated our algorithm on a set of commonly used benchmark instances from digital signal processing and report that all encountered problems could be solved, with 36.53% of the solutions being optimal within a time limit of only five minutes. Compared to worst case evaluations, we report MUX and LR savings of up to 42.74% and 26.62%, respectively. To evaluate the impact on the resulting circuit after place and route, we studied FPGA implementations of several benchmark instances and recorded look-up table and flip-flop reductions of up to 13.70% and 5.24%, respectively, compared to previous work and to an extensive set of randomly generated bindings when state-of-the-art algorithms fail to find a feasible solution.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131166718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPL Demo: Hot Reconfiguration - Partial Reconfiguration without Bounds FPL演示:热重构-部分重构无边界
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00084
Myrtle Shah
{"title":"FPL Demo: Hot Reconfiguration - Partial Reconfiguration without Bounds","authors":"Myrtle Shah","doi":"10.1109/FPL57034.2022.00084","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00084","url":null,"abstract":"Traditionally, partial reconfiguration of FPGAs involves replacing defined regions of the design, entirely replacing the logic and losing the state within that region. However, configuration frame reloads typically being glitch free means that wires and logic can safely be added and removed at runtime, without losing state - potentially even without stopping the clock! This could even be extended into an “edit and continue” mode where register positions and unchanged logic is preserved, and only changed logic cones are replaced, to enable small design changes to be made to a live system with only a brief pause and no loss of state.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122989486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and Exploration of Elastic CGRAs 弹性CGRAs的建模与探索
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00067
Omar Ragheb, Tianyi Yu, David Ma, J. Anderson
{"title":"Modeling and Exploration of Elastic CGRAs","authors":"Omar Ragheb, Tianyi Yu, David Ma, J. Anderson","doi":"10.1109/FPL57034.2022.00067","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00067","url":null,"abstract":"Elastic design concepts have the potential to bring multiple benefits to coarse-grained reconfigurable arrays (CGRAs) architecture, including the ability to interface with memories, having unknown latencies, incorporate run-time variable-latency processing elements, and ease the CGRA mapping challenges of scheduling, placement and routing. However, there are overheads in terms of power, performance and area (PPA) associated with the design and implementation of elastic circuits. In this paper, we quantify these overheads in the CGRA context by first extending an open-source CGRA modelling and exploration framework (CGRA-ME) [4] to allow elastic circuit primitives (e.g. fork, join, merge, diverge, etc.) to be used when composing/modelling a CGRA architecture. We then use this new capability to “elasticize” two widely studied CGRA architectures, ADRES [11] and HyCUBE [8]. The PPA of the elastic versions of the CGRAs are compared with their traditional statically scheduled counterparts. We also evaluate the PPA “cost” of several elastic-circuit design points, such as elastic buffer length and inclusion of merge and diverge components.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129097641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A-U3D: A Unified 2D/3D CNN Accelerator on the Versal Platform for Disparity Estimation A- u3d:通用平台上用于视差估计的统一2D/3D CNN加速器
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00029
Tianyu Zhang, Dong Li, Hong Wang, Yunzhi Li, Xiang Ma, Wei Luo, Yu Wang, Yang Huang, Yi Li, Yu Zhang, Xinlin Yang, Xijie Jia, Qiang Lin, Lu Tian, Fan Jiang, Dongliang Xie, Hong Luo, Yi Shan
{"title":"A-U3D: A Unified 2D/3D CNN Accelerator on the Versal Platform for Disparity Estimation","authors":"Tianyu Zhang, Dong Li, Hong Wang, Yunzhi Li, Xiang Ma, Wei Luo, Yu Wang, Yang Huang, Yi Li, Yu Zhang, Xinlin Yang, Xijie Jia, Qiang Lin, Lu Tian, Fan Jiang, Dongliang Xie, Hong Luo, Yi Shan","doi":"10.1109/FPL57034.2022.00029","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00029","url":null,"abstract":"3-Dimensional (3D) convolutional neural networks (CNN) are widely used in the field of disparity estimation. However, 3D CNN is more computationally dense than 2D CNN due to the increase in the disparity dimension. To enable more practical applications in autonomous driving, robotics, and other scenarios on embedded devices, we propose a unified 2D/3D CNN accelerator (A-U3D) design. This design unifies 3D standard / transposed convolution into 2D standard convolution, respectively. Our processing unit can support 2D and 3D convolution in the same mode without additional structures. Based on PSMNet, a 3D-based CNN for disparity estimation, we build a heterogeneous multi-core system integrated with A-U3D in conjunction with CPU, DSP, and AI Engines on the Xilinx Versal ACAP platform. Running the pruned 8-bit model, our A-U3D system achieves 0.289s latency, which is 11.5 × faster than the state-of-the-art solution on the same platform, and reaches an end-to-end (E2E) performance of 10.1 frames per second (FPS). Our proposed system explores the feasibility of deploying 3D CNNs with large workloads on FPGA.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132471362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Precise Characterizing of FPGAs in Production Systems fpga在生产系统中的精确表征
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00080
Bardia Babaei, Dirk Koch
{"title":"Precise Characterizing of FPGAs in Production Systems","authors":"Bardia Babaei, Dirk Koch","doi":"10.1109/FPL57034.2022.00080","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00080","url":null,"abstract":"The deployment of FPGAs in cloud data centers has entailed new security concerns. Although several defensive mechanisms are employed to detect and prevent malicious designs, a health monitoring tool can warn cloud service providers about the failure of implemented defensive fences. This PhD project aims to monitor the health status of internal FPGA resources by performing a precise timing characterization.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117248189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Application Mapping for Multi-FPGA Systems with Multi-ejection STDM Switches 基于多弹射STDM交换机的多fpga系统应用映射优化
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00032
Kohe Ito, Ryota Yasudo, H. Amano
{"title":"Optimizing Application Mapping for Multi-FPGA Systems with Multi-ejection STDM Switches","authors":"Kohe Ito, Ryota Yasudo, H. Amano","doi":"10.1109/FPL57034.2022.00032","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00032","url":null,"abstract":"Multi-FPGA systems have received an attention as a computing cluster for multi-access edge computing (MEC). Also, they can process time-critical jobs with their hardwired logic. For this purpose, the static time-division multiplexing (STDM) network is adopted because it enables to predict latency and bandwidth. However, the overall performance of the STDM network depends on the number of time slots. This paper proposes a new mapping tool that optimizes the application mapping so that the number of slots is minimized. Our tool handles multicasts and multi-ejection function which are effective techniques for STDM switches implemented on an FPGA cluster. For applications with all-to-all communication, our experimental results show that the tool reduces the number of time slots by 59–68% with both multicasts and multi-ejection switches.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117032593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing Flexibility of Cloud FPGA Virtualization 提高云FPGA虚拟化的灵活性
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00060
Jinjie Ruan, Yisong Chang, Ke Zhang, Kan Shi, Mingyu Chen, Yungang Bao
{"title":"Increasing Flexibility of Cloud FPGA Virtualization","authors":"Jinjie Ruan, Yisong Chang, Ke Zhang, Kan Shi, Mingyu Chen, Yungang Bao","doi":"10.1109/FPL57034.2022.00060","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00060","url":null,"abstract":"FPGA virtualization enables multiple tenants to share programmable hardware resources for application accelerations in cloud. However, such technique is still of limited usage in commercial FPGA cloud platforms, which mainly lies in: 1) absence of direct programming interfaces of the virtualized FPGA accelerators (vFPGAs) in tenants' virtual machines (VMs), 2) a fixed VM-vFPGA data movement scheme that is inadaptive to a wide range of data sizes among different applications, and 3) performance degradation due to unregulated inter-vFPGA competitions for limited shareable external resources (e.g., off-chip DRAM bandwidth). To tackle all the above issues, we propose a flexible FPGA virtualization framework and prototype an open cloud platform with ARM SoC-equipped FPGAs. Under such framework, tenants are allowed to directly initiate FPGA partial reconfiguration in isolated VMs via a direct I/O-like vFPGA device driver with as low as 20ms overhead. A hybrid data movement approach that leverages both memory-mapped I/O and DMA is also introduced in our framework to adaptively guarantee moderate VM-vFPGA bandwidth towards various data sizes. Moreover, a lightweight priority-based hardware scheduler is elaborated to monitor and dynamically allocate off-chip DRAM bandwidth among vFPGAs. Based on our preliminary infrastructure-level evaluation results, the proposed framework and the open prototyping are of significant interests to researchers looking forward to conducting further explorations in FPGA virtualization.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127785641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BunchBloomer: Cost-Effective Bloom Filter Accelerator for Genomics Applications BunchBloomer:经济高效的Bloom过滤器加速器,用于基因组学应用
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00014
Seongyoung Kang, Tarun Sai Ganesh Nerella, Shashank Uppoor, S. Jun
{"title":"BunchBloomer: Cost-Effective Bloom Filter Accelerator for Genomics Applications","authors":"Seongyoung Kang, Tarun Sai Ganesh Nerella, Shashank Uppoor, S. Jun","doi":"10.1109/FPL57034.2022.00014","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00014","url":null,"abstract":"Bloom filters are a very important tool for many applications including genomics, where they are used as a compact data structure for counting k-mers, represent de Bruijn graphs, and more. Due to their random-access nature coupled with the large size required for genomics, Bloom filters for genomics can easily become bound by the random access performance of off-chip memory. This is especially true for accelerators such as FPGAs and GPUs, which can easily remove the computation overhead of the multiple hash functions. As a result, Bloom filter accelerators have typically focused either on small filters which can fit in fast on-chip memory, or require fast off-chip memory fabric such as Hybrid Memory Cubes. In this work, we present BunchBloomer, which improves the cost-effectiveness of FPGA Bloom filter accelerators by making better use of cheaper, lower-power DDR memory. BunchBloomer uses a multi-layer radix sorter to group table updates into bursts directed to the same 8 KiB memory region, which can be efficiently cached in on-chip memory. A single BunchBloomer device outperforms a costly 12-core server by over 2×, demonstrating an order of magnitude better power efficiency. It even achieves better power efficiency compared to published FPGA Bloom filter accelerators equipped with Hybrid Memory Cubes.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114571984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信