2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献

筛选
英文 中文
Bitfiltrator: A general approach for reverse-engineering Xilinx bitstream formats bitfilter:用于逆向工程赛灵思比特流格式的通用方法
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00039
Sahand Kashani, Mahyar Emami, J. Larus
{"title":"Bitfiltrator: A general approach for reverse-engineering Xilinx bitstream formats","authors":"Sahand Kashani, Mahyar Emami, J. Larus","doi":"10.1109/FPL57034.2022.00039","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00039","url":null,"abstract":"As the usage of FPGAs spreads, engineers will inevitably employ them in ways unforeseen-or unwanted-by their manufacturers. Xilinx's toolchains offer multiple points for customizing the FPGA compilation flow, but all flows must end with Vivado as it is the only tool capable of generating the bitstream to program an FPGA. Xilinx does not document its bitstream format, so users who wish to bypass Vivado and modify a bitstream directly must reverse-engineer it to discover the location and format of cells. Prior work has reverse-engineered parts of the bitstream format for security or debugging/instrumentation activities, but no paper has explained how to do this reverse engineering systematically! Code from prior efforts (when available) is hard-coded to reverse engineer a specific device and is difficult or impossible to use for another one. These efforts-focused on applications instead of reverse-engineering-compel engineers who need to modify a bitstream to rediscover unwritten practice. Our work bridges this gap by explaining: (1) the various parameters needed to navigate a bitstream correctly, (2) the experiments to obtain them, and (3) the many pitfalls and erroneous assumptions to avoid while undertaking this endeavor. We demonstrate our technique by using it to extract the bitstream format of initial LUT equations, LUTRAM contents, BRAM contents, and register values in Xilinx UltraScale and UltraScale+ FPGAs. Our methods are implemented in an open-source tool, Bitfiltrator [1], that can extract device layouts and architecture-specific bitstream formats for these cells automatically and without physical access to an FPGA.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128960508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Model-based Generation of Hardware/Software Architectures for Robotics Systems 基于模型的机器人系统硬件/软件体系结构生成
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00034
Ariel Podlubne, Johannes Mey, Sergio A. Pertuz, U. Assmann, Diana Göhringer
{"title":"Model-based Generation of Hardware/Software Architectures for Robotics Systems","authors":"Ariel Podlubne, Johannes Mey, Sergio A. Pertuz, U. Assmann, Diana Göhringer","doi":"10.1109/FPL57034.2022.00034","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00034","url":null,"abstract":"Robotic systems compute data from multiple sensors to perform several actions (e.g., path planning, object detection). FPGA - based architectures for such systems may consist of several accelerators to process compute-intensive algorithms. Designing and implementing such complex systems tends to be an arduous task. This work proposes a modeling approach to generate architectures for such applications, compliant with existing robotics middlewares (e.g., ROS, ROS2). The challenge is to have a compact, yet expressive description of the system with just enough information to generate all required components and to integrate existing algorithms. This system model must be generalizable, so it is not application-dependent, and it must exploit the benefits of FPGAs over software solutions. Previous work mainly focused on individual accelerators rather than all components involved in a system and their interactions. The proposed approach exploits the advantages of model-driven engineering and model-based code generation to produce all components, i.e., message converters acting as middleware interfaces and wrappers to integrate algorithms. Data type and data flow analysis are performed to derive the necessary information to generate the components and their connections. Solutions to several identified challenges for generating entire systems from such models are evaluated using four different use cases.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115283816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Synthesized In-BramGarbage Collection for Accelerators with Immutable Memory 具有不可变内存的加速器的合成内存垃圾收集
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00019
Martha Barker, S. Edwards, Martha A. Kim
{"title":"Synthesized In-BramGarbage Collection for Accelerators with Immutable Memory","authors":"Martha Barker, S. Edwards, Martha A. Kim","doi":"10.1109/FPL57034.2022.00019","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00019","url":null,"abstract":"Speed and ease of accelerator design is a growing need. High level programming languages have provided significant gains in the software world, but lag for hardware. We present a hardware implementation of a garbage collector that automates memory management, one of the major conveniences of modern software languages. Our garbage collector runs concurrently with the application it serves, using already-idle memory slots to do its work with little to no impact on performance. To achieve this, our collector exploits rapid synchronization that is straightforward in hardware but difficult in software. This synchronization enables the collector to interleave with fine pockets of idleness on a per-cycle, per-heap basis. Our collector typically incurred negligible overhead, only slowing the application to wait for collection to free memory when the heaps were so small that the collector could not keep pace with allocation. We also found that our concurrent collector performs best under an eager collection policy that collects garbage well before the application exhausts the available memory. Although this eager strategy performs more collection operations than strictly necessary, the application never pauses because our collector operates entirely in the background.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiPR: High-level Partial Reconfiguration for Fast Incremental FPGA Compilation HiPR:用于快速增量FPGA编译的高级部分重构
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00022
Yuanlong Xiao, A. Hota, Dongjoon Park, A. DeHon
{"title":"HiPR: High-level Partial Reconfiguration for Fast Incremental FPGA Compilation","authors":"Yuanlong Xiao, A. Hota, Dongjoon Park, A. DeHon","doi":"10.1109/FPL57034.2022.00022","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00022","url":null,"abstract":"Partial Reconfiguration (PR) is a key technique in the design of modern FPGAs. However, current PR tools heavily rely on the developers to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR tools do not consider High-Level-Synthesis languages either, which is of great interest to software developers. We propose HiPR, an open-source framework, to bridge the gap between HLS and PR. HiPR allows the developer to define partially reconfigurable C/C++ functions instead of Verilog modules, which benefits the FPGA incremental compilation and automates the flow from C/C++ to bitstreams. By mapping Rosetta HLS benchmarks, the incremental compilation can be accelerated by 3–10× compared with Xilinx Vitis normal flow without performance loss.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125959913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Virtualization of Embedded Reconfigurable Systems 嵌入式可重构系统的虚拟化
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00078
Cornelia Wulf, Diana Göhringer
{"title":"Virtualization of Embedded Reconfigurable Systems","authors":"Cornelia Wulf, Diana Göhringer","doi":"10.1109/FPL57034.2022.00078","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00078","url":null,"abstract":"With the trend to consolidate multiple systems onto the same hardware platform, which can be observed for example in the automotive industry, virtualization of embedded systems becomes increasingly important. Often small and efficient real time operating systems (RTOS) run besides general purpose operating systems (GPOS) with a convenient, high level application interface. When virtualizing embedded reconfigurable systems, FPGA characteristics have to be considered like limited FPGA area and high reconfiguration latencies. We present the FPGA virtualization layer L4ReC that enables the shared usage of reconfigurable resources by several guest operating systems under consideration of the constraints given by embedded reconfigurable systems. First results target isolation, energy efficiency, and FPGA resource management considering special requirements of guest operating systems.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123526610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SkeletonGCN: A Simple Yet Effective Accelerator For GCN Training 一个简单而有效的GCN培训加速器
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00073
Chen Wu, Zhuofu Tao, Kun Wang, Lei He
{"title":"SkeletonGCN: A Simple Yet Effective Accelerator For GCN Training","authors":"Chen Wu, Zhuofu Tao, Kun Wang, Lei He","doi":"10.1109/FPL57034.2022.00073","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00073","url":null,"abstract":"Graph Convolutional Networks (GCNs) have shown great results but come with large computation costs and memory overhead. Recently, sampling-based approaches have been proposed to alter input sizes, which allows large GCN workloads to align to hardware constraints. Motivated by this flexibility, we propose an FPGA-based GCN accelerator, named SkeletonGCN, along with multiple software-hardware co-optimizations to improve training efficiency. We first quantize all feature and adjacency matrices of GCN from FP32 to SINT16. We then simplify the non-linear operations to better fit the FPGA computation, and identify reusable intermediate results to eliminate redundant computation. Moreover, we employ a linear time sparse matrix compression algorithm to further reduce memory bandwidth while allowing efficient decompression on hardware. Finally, we propose a unified hardware architecture to process sparse-dense matrix multiplication (SpMM) and dense matrix multiplication (MM), all on the same group of PEs to increase DSP utilization on FPGA. Evaluation is performed on a Xilinx Alveo U200 board. Compared with existing FPGA-based accelerator on the same network architecture, SkeletonGCN can achieve up to 11.3x speedup while maintaining the same training accuracy. In addition, SkeletonGCN can achieve up to 178x and 13.1x speedup over state-of-art CPU and GPU implementation on popular datasets, respectively.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"26 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120824306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
TRAC: Compilation-Based Design of Transformer Accelerators for FPGAs 基于编译的fpga变压器加速器设计
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00015
Patrick Plagwitz, Frank Hannig, Jürgen Teich
{"title":"TRAC: Compilation-Based Design of Transformer Accelerators for FPGAs","authors":"Patrick Plagwitz, Frank Hannig, Jürgen Teich","doi":"10.1109/FPL57034.2022.00015","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00015","url":null,"abstract":"Transformer-type Neural Networks (NNs) have shown impressive accuracy numbers in Natural Language Processing (NLP) applications where Recurrent Neural Networks (RNNs) have been in use before, even surpassing them. However, differing considerably from common types of NNs, existing accelerator designs, particularly for Field-Programmable Gate Arrays (FPGAs), cannot be used to implement them. Previous research has shown FPGAs to be platforms superior to CPUs and even GPUs for accelerating NNs when it comes to energy efficiency. Following the development of automated compiler-based design flows for NNs, there is still a lack of such an approach for transformers and FPGA targets. In this realm, this paper presents a novel compiler called TRAC as well as a library of operators and modules for implementing transformer accelerators on FPGAs. Based on optimization and code generation settings in the compiler using an integrated approach combining weight compression techniques with according adaptations of the accelerator modules, a design space of accelerators is defined and explored. For each design, a system-level data path and control unit architecture is generated, which integrates module-level designs using hierarchical High-Level Synthesis (HLS). We evaluate our implementation for the BERT network and provide results regarding the trade-off between execution time, accuracy, and FPGA resource usage.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"41 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133437410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ultra Low Latency Machine Learning for Scientific Edge Applications 用于科学边缘应用的超低延迟机器学习
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00068
Narasinga Rao Miniskar, Aaron R. Young, Frank Liu, W. Blokland, A. Cabrera, J. Vetter
{"title":"Ultra Low Latency Machine Learning for Scientific Edge Applications","authors":"Narasinga Rao Miniskar, Aaron R. Young, Frank Liu, W. Blokland, A. Cabrera, J. Vetter","doi":"10.1109/FPL57034.2022.00068","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00068","url":null,"abstract":"In this paper, we present an FPGA design of an extremely low latency scientific machine learning application at the edge. Real-time prediction of errant high-energy particle beams at scientific facilities such as Spallation Neutron Source (SNS) is crucial to avoid damages to the equipment. Machine learning techniques are becoming increasingly effective to detect subtle signatures of the errant beams in the noisy sensor signals. However, to minimize potential damage done by errant beam, real-time errant beam detection has to be completed with extremely low latency, usually less than 1 microsecond. By stream processing the input features and employing out-of-order execution of decision nodes among the decision trees, we demonstrate that our highly efficient FPGA implementation can achieve 60 nanoseconds of computing latency for complex random forest models with 10,000 input features.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133604048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA 基于FPGA的高通量混合精度CNN加速器设计
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00061
Cecilia Latotzke, Tim Ciesielski, T. Gemmeke
{"title":"Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA","authors":"Cecilia Latotzke, Tim Ciesielski, T. Gemmeke","doi":"10.1109/FPL57034.2022.00061","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00061","url":null,"abstract":"Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9 × and 9.4 × compared to the respective floating-point baseline.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124538481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Spade: An HDL Inspired by Modern Software Languages Spade:受现代软件语言启发的HDL
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00075
Frans Skarman, O. Gustafsson
{"title":"Spade: An HDL Inspired by Modern Software Languages","authors":"Frans Skarman, O. Gustafsson","doi":"10.1109/FPL57034.2022.00075","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00075","url":null,"abstract":"Spade is a new hardware description language which aims to make hardware description easier and less error prone. It does this by taking lessons from software programming languages, and adding language level support for common hardware constructs, all without compromising the low level control over what hardware gets generated.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126797151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信