2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献_第3页

Bitfiltrator: A general approach for reverse-engineering Xilinx bitstream formats bitfilter:用于逆向工程赛灵思比特流格式的通用方法

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00039

Sahand Kashani, Mahyar Emami, J. Larus

{"title":"Bitfiltrator: A general approach for reverse-engineering Xilinx bitstream formats","authors":"Sahand Kashani, Mahyar Emami, J. Larus","doi":"10.1109/FPL57034.2022.00039","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00039","url":null,"abstract":"As the usage of FPGAs spreads, engineers will inevitably employ them in ways unforeseen-or unwanted-by their manufacturers. Xilinx's toolchains offer multiple points for customizing the FPGA compilation flow, but all flows must end with Vivado as it is the only tool capable of generating the bitstream to program an FPGA. Xilinx does not document its bitstream format, so users who wish to bypass Vivado and modify a bitstream directly must reverse-engineer it to discover the location and format of cells. Prior work has reverse-engineered parts of the bitstream format for security or debugging/instrumentation activities, but no paper has explained how to do this reverse engineering systematically! Code from prior efforts (when available) is hard-coded to reverse engineer a specific device and is difficult or impossible to use for another one. These efforts-focused on applications instead of reverse-engineering-compel engineers who need to modify a bitstream to rediscover unwritten practice. Our work bridges this gap by explaining: (1) the various parameters needed to navigate a bitstream correctly, (2) the experiments to obtain them, and (3) the many pitfalls and erroneous assumptions to avoid while undertaking this endeavor. We demonstrate our technique by using it to extract the bitstream format of initial LUT equations, LUTRAM contents, BRAM contents, and register values in Xilinx UltraScale and UltraScale+ FPGAs. Our methods are implemented in an open-source tool, Bitfiltrator [1], that can extract device layouts and architecture-specific bitstream formats for these cells automatically and without physical access to an FPGA.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128960508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Model-based Generation of Hardware/Software Architectures for Robotics Systems 基于模型的机器人系统硬件/软件体系结构生成

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00034

Ariel Podlubne, Johannes Mey, Sergio A. Pertuz, U. Assmann, Diana Göhringer

{"title":"Model-based Generation of Hardware/Software Architectures for Robotics Systems","authors":"Ariel Podlubne, Johannes Mey, Sergio A. Pertuz, U. Assmann, Diana Göhringer","doi":"10.1109/FPL57034.2022.00034","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00034","url":null,"abstract":"Robotic systems compute data from multiple sensors to perform several actions (e.g., path planning, object detection). FPGA - based architectures for such systems may consist of several accelerators to process compute-intensive algorithms. Designing and implementing such complex systems tends to be an arduous task. This work proposes a modeling approach to generate architectures for such applications, compliant with existing robotics middlewares (e.g., ROS, ROS2). The challenge is to have a compact, yet expressive description of the system with just enough information to generate all required components and to integrate existing algorithms. This system model must be generalizable, so it is not application-dependent, and it must exploit the benefits of FPGAs over software solutions. Previous work mainly focused on individual accelerators rather than all components involved in a system and their interactions. The proposed approach exploits the advantages of model-driven engineering and model-based code generation to produce all components, i.e., message converters acting as middleware interfaces and wrappers to integrate algorithms. Data type and data flow analysis are performed to derive the necessary information to generate the components and their connections. Solutions to several identified challenges for generating entire systems from such models are evaluated using four different use cases.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115283816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Synthesized In-BramGarbage Collection for Accelerators with Immutable Memory 具有不可变内存的加速器的合成内存垃圾收集

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00019

Martha Barker, S. Edwards, Martha A. Kim

{"title":"Synthesized In-BramGarbage Collection for Accelerators with Immutable Memory","authors":"Martha Barker, S. Edwards, Martha A. Kim","doi":"10.1109/FPL57034.2022.00019","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00019","url":null,"abstract":"Speed and ease of accelerator design is a growing need. High level programming languages have provided significant gains in the software world, but lag for hardware. We present a hardware implementation of a garbage collector that automates memory management, one of the major conveniences of modern software languages. Our garbage collector runs concurrently with the application it serves, using already-idle memory slots to do its work with little to no impact on performance. To achieve this, our collector exploits rapid synchronization that is straightforward in hardware but difficult in software. This synchronization enables the collector to interleave with fine pockets of idleness on a per-cycle, per-heap basis. Our collector typically incurred negligible overhead, only slowing the application to wait for collection to free memory when the heaps were so small that the collector could not keep pace with allocation. We also found that our concurrent collector performs best under an eager collection policy that collects garbage well before the application exhausts the available memory. Although this eager strategy performs more collection operations than strictly necessary, the application never pauses because our collector operates entirely in the background.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HiPR: High-level Partial Reconfiguration for Fast Incremental FPGA Compilation HiPR:用于快速增量FPGA编译的高级部分重构

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00022

Yuanlong Xiao, A. Hota, Dongjoon Park, A. DeHon

引用次数: 4

Virtualization of Embedded Reconfigurable Systems 嵌入式可重构系统的虚拟化

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00078

Cornelia Wulf, Diana Göhringer

引用次数: 0

SkeletonGCN: A Simple Yet Effective Accelerator For GCN Training 一个简单而有效的GCN培训加速器

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00073

Chen Wu, Zhuofu Tao, Kun Wang, Lei He

{"title":"SkeletonGCN: A Simple Yet Effective Accelerator For GCN Training","authors":"Chen Wu, Zhuofu Tao, Kun Wang, Lei He","doi":"10.1109/FPL57034.2022.00073","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00073","url":null,"abstract":"Graph Convolutional Networks (GCNs) have shown great results but come with large computation costs and memory overhead. Recently, sampling-based approaches have been proposed to alter input sizes, which allows large GCN workloads to align to hardware constraints. Motivated by this flexibility, we propose an FPGA-based GCN accelerator, named SkeletonGCN, along with multiple software-hardware co-optimizations to improve training efficiency. We first quantize all feature and adjacency matrices of GCN from FP32 to SINT16. We then simplify the non-linear operations to better fit the FPGA computation, and identify reusable intermediate results to eliminate redundant computation. Moreover, we employ a linear time sparse matrix compression algorithm to further reduce memory bandwidth while allowing efficient decompression on hardware. Finally, we propose a unified hardware architecture to process sparse-dense matrix multiplication (SpMM) and dense matrix multiplication (MM), all on the same group of PEs to increase DSP utilization on FPGA. Evaluation is performed on a Xilinx Alveo U200 board. Compared with existing FPGA-based accelerator on the same network architecture, SkeletonGCN can achieve up to 11.3x speedup while maintaining the same training accuracy. In addition, SkeletonGCN can achieve up to 178x and 13.1x speedup over state-of-art CPU and GPU implementation on popular datasets, respectively.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"26 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120824306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

TRAC: Compilation-Based Design of Transformer Accelerators for FPGAs 基于编译的fpga变压器加速器设计

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00015

Patrick Plagwitz, Frank Hannig, Jürgen Teich

{"title":"TRAC: Compilation-Based Design of Transformer Accelerators for FPGAs","authors":"Patrick Plagwitz, Frank Hannig, Jürgen Teich","doi":"10.1109/FPL57034.2022.00015","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00015","url":null,"abstract":"Transformer-type Neural Networks (NNs) have shown impressive accuracy numbers in Natural Language Processing (NLP) applications where Recurrent Neural Networks (RNNs) have been in use before, even surpassing them. However, differing considerably from common types of NNs, existing accelerator designs, particularly for Field-Programmable Gate Arrays (FPGAs), cannot be used to implement them. Previous research has shown FPGAs to be platforms superior to CPUs and even GPUs for accelerating NNs when it comes to energy efficiency. Following the development of automated compiler-based design flows for NNs, there is still a lack of such an approach for transformers and FPGA targets. In this realm, this paper presents a novel compiler called TRAC as well as a library of operators and modules for implementing transformer accelerators on FPGAs. Based on optimization and code generation settings in the compiler using an integrated approach combining weight compression techniques with according adaptations of the accelerator modules, a design space of accelerators is defined and explored. For each design, a system-level data path and control unit architecture is generated, which integrates module-level designs using hierarchical High-Level Synthesis (HLS). We evaluate our implementation for the BERT network and provide results regarding the trade-off between execution time, accuracy, and FPGA resource usage.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"41 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133437410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Ultra Low Latency Machine Learning for Scientific Edge Applications 用于科学边缘应用的超低延迟机器学习

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00068

Narasinga Rao Miniskar, Aaron R. Young, Frank Liu, W. Blokland, A. Cabrera, J. Vetter

引用次数: 1

Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA 基于FPGA的高通量混合精度CNN加速器设计

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00061

Cecilia Latotzke, Tim Ciesielski, T. Gemmeke

{"title":"Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA","authors":"Cecilia Latotzke, Tim Ciesielski, T. Gemmeke","doi":"10.1109/FPL57034.2022.00061","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00061","url":null,"abstract":"Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9 × and 9.4 × compared to the respective floating-point baseline.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124538481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Spade: An HDL Inspired by Modern Software Languages Spade:受现代软件语言启发的HDL

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00075

Frans Skarman, O. Gustafsson

引用次数: 0