{"title":"Demos and PhD Forum","authors":"","doi":"10.1109/fpl57034.2022.00011","DOIUrl":"https://doi.org/10.1109/fpl57034.2022.00011","url":null,"abstract":"","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115344028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joseph Powell, Kaspar Matas, Kristiyan Manev, Dirk Koch
{"title":"FPL Demo: FPGA Bitstream Virus Scanning","authors":"Joseph Powell, Kaspar Matas, Kristiyan Manev, Dirk Koch","doi":"10.1109/FPL57034.2022.00085","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00085","url":null,"abstract":"The expansion of the FPGA into complex market sectors imposes new demands on the security model of the devices. This demonstration shows off a series of tools developed to decode and scan the contents of a bitstream for malicious designs.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116737420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Heap Management in High-Level Synthesis for Many-Accelerator Architectures","authors":"Argyris Kokkinis, D. Diamantopoulos, K. Siozios","doi":"10.1109/FPL57034.2022.00051","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00051","url":null,"abstract":"Dynamic Memory Management (DMM) in High-Level Synthesis has been introduced as a promising solution for optimizing the accelerators' memory usage and reducing the occupied on-chip area. Schemes for dynamic memory allocation have been suggested for many-accelerator architectures where memory sharing and resource reusing has the potential to increase the number of synthesized accelerators, rising the throughput per Watt ratio. However, in those architectures, the simultaneous execution of many accelerators may reduce memory efficiency, increasing the Memory Allocation Failures (MAFs) as a consequence of the sub-optimal utilization of the shared memories. MAFs due to memory fragmentation can reach up to 38.5% of the overall memory allocation failures when accelerators with heterogeneous allocation sizes are executed in parallel in a shared memory space. In this manuscript we propose an HLS methodology for minimizing MAFs for many-accelerator DMM frameworks that are caused by on-chip inefficient memory utilization. Our proposed methodology is orthogonal to the static memory allocation techniques of the Xilinx Vitis suite and was evaluated using Xilinx Vitis/Vitis HLS 2020.1 on an Alveo U200 FPGA device as an extension of the Memluv DMM framework. In the experimental results we show that our proposed methodology may decrease up to 38.5% the MAFs due to fragmentation and up to 91% the overall allocation fails with a controllable increase on the utilized resources and a on the accelerators' latency.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128697374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Lightweight Multi-Attack CAN Intrusion Detection System on Hybrid FPGAs","authors":"Shashwat Khandelwal, Shanker Shreejith","doi":"10.1109/FPL57034.2022.00070","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00070","url":null,"abstract":"Rising connectivity in vehicles is enabling new capabilities like connected autonomous driving and advanced driver assistance systems (ADAS) for improving the safety and reliability of next-generation vehicles. This increased access to in-vehicle functions compromises critical capabilities that use legacy invehicle networks like Controller Area Network (CAN), which has no inherent security or authentication mechanism. Intrusion detection and mitigation approaches, particularly using machine learning models, have shown promising results in detecting multiple attack vectors in CAN through their ability to generalise to new vectors. However, most deployments require dedicated computing units like GPUs to perform line-rate detection, consuming much higher power. In this paper, we present a lightweight multi-attack quantised machine learning model that is deployed using Xilinx's Deep Learning Processing Unit IP on a Zynq Ultrascale+ (XCZU3EG) FPGA, which is trained and validated using the public CAN Intrusion Detection dataset. The quantised model detects denial of service and fuzzing attacks with an accuracy of above 99 % and a false positive rate of 0.07%, which are comparable to the state-of-the-art techniques in the literature. The Intrusion Detection System (IDS) execution consumes just 2.0 W with software tasks running on the ECU and achieves a 25 % reduction in per-message processing latency over the state-of-the-art implementations. This deployment allows the ECU function to coexist with the IDS with minimal changes to the tasks, making it ideal for real-time IDS in in-vehicle systems.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"34 41","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Bex, Furkan Turan, Michiel Van Beirendonck, I. Verbauwhede
{"title":"Mining CryptoNight-Haven on the Varium C1100 Blockchain Accelerator Card","authors":"Lucas Bex, Furkan Turan, Michiel Van Beirendonck, I. Verbauwhede","doi":"10.1109/FPL57034.2022.00074","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00074","url":null,"abstract":"Cryptocurrency mining is an energy-intensive process that presents a prime candidate for hardware acceleration. This work-in-progress presents the first coprocessor design for the ASIC-resistant CryptoNight-Haven Proof of Work (PoW) algorithm. We construct our hardware accelerator as a Xilinx Run Time (XRT) RTL kernel targeting the Xilinx Varium C1100 Blockchain Accelerator Card. The design employs deeply pipelined computation and High Bandwidth Memory (HBM) for the underlying scratchpad data. We aim to compare our accelerator to existing CPU and GPU miners to show increased throughput and energy efficiency of its hash computations.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131506731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPL Demo: An FPGA-IP Prototype Chip for MEC devices","authors":"M. Kuga, M. Iida, H. Amano","doi":"10.1109/FPL57034.2022.00083","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00083","url":null,"abstract":"This demonstration shows a prototype chip of SLM (Scalable Logic Module) for a novel FPGA-IP embedded in various chips for edge computing. In this paper, the authors briefly describe the architecture of our FPGA-IP and the evaluation environment for the prototype chip.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131273601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ioanna–Maria Panagou, M. Gkeka, Alexandros Patras, S. Lalis, C. Antonopoulos, Nikolaos Bellas
{"title":"FPGA Roofline modeling and its Application to Visual SLAM","authors":"Ioanna–Maria Panagou, M. Gkeka, Alexandros Patras, S. Lalis, C. Antonopoulos, Nikolaos Bellas","doi":"10.1109/FPL57034.2022.00030","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00030","url":null,"abstract":"The Roofline model has been proposed to visually associate application performance against the computational and bandwidth capabilities of the underlying platform. Since FPGAs lack fixed operation units, modifications in the original CPU-based Roofline model should be made. In this paper, we propose a new application-centric approach to construct the FPGA Roofline model extending previous work and encompassing resource and latency constraints to provide a more fitting ceiling. Moreover, we generalize our model to accommodate platforms with multiple accelerators whose execution footprint may be strongly input-dependent due to conditionals and complex loop structures. We evaluate our model and compare it with previous models on KinectFusion, a complex, multi-kernel algorithm for visual Simultaneous Localization and Mapping (vSLAM) used for autonomous agent navigation. Our work makes it feasible to deploy Roofline analysis on a wider range of MPSoC-based FPGAs that consist of more complex HW/ SW components and not just single accelerators.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131287102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Intrinsic Evolvable Systems","authors":"Najdet Charaf, Diana Göhringer","doi":"10.1109/FPL57034.2022.00077","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00077","url":null,"abstract":"Systems with hardware that can dynamically and autonomously change their architecture and behavior by interacting with their environment are becoming very valuable in modern applications. Therefore, research and development in intrinsically evolvable embedded systems are becoming increasingly attractive. Runtime reconfiguration and relocation are a promising approach for designing self-adaptive and self-optimizing autonomous embedded systems. The vision behind this PhD work is to provide an all-encompassing framework to automate all the challenging tasks required for designing self-adaptive systems. This paper presents our framework and preliminary results and highlights our next steps and future work.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117147983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiqiang Que, Marcus Loo, Hongxiang Fan, M. Pierini, A. Tapper, W. Luk
{"title":"Optimizing Graph Neural Networks for Jet Tagging in Particle Physics on FPGAs","authors":"Zhiqiang Que, Marcus Loo, Hongxiang Fan, M. Pierini, A. Tapper, W. Luk","doi":"10.1109/FPL57034.2022.00057","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00057","url":null,"abstract":"This work proposes a novel reconfigurable architecture for reducing the latency of JEDI-net, a Graph Neural Network (GNN) based algorithm for jet tagging in particle physics, which achieves state-of-the-art accuracy. Accelerating JEDI-net is challenging since it requires low latency to deploy the network for event selection at the CERN Large Hadron Collider. This paper proposes an outer-product based matrix multiplication approach customized for GNN-based JEDI-net, which increases data spatial locality and reduces design latency. It is further enhanced by code transformation with strength reduction which exploits sparsity patterns and binary adjacency matrices to increase hardware efficiency while reducing latency. In addition, a customizable template for this architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools. Evaluation results show that our FPGA implementation is up to 9.5 times faster and consumes up to 6.5 times less power than a GPU implementation. Moreover, the throughput of our FPGA design is sufficiently high to enable deployment of JEDI-net in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"3 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128803960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingyi Zhang, R. Kannan, V. Prasanna, Carl E. Busart
{"title":"Accurate, Low-latency, Efficient SAR Automatic Target Recognition on FPGA","authors":"Bingyi Zhang, R. Kannan, V. Prasanna, Carl E. Busart","doi":"10.1109/FPL57034.2022.00013","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00013","url":null,"abstract":"Synthetic aperture radar (SAR) automatic target recognition (ATR) is the key technique for remote-sensing image recognition. The state-of-the-art convolutional neural networks (CNNs) for SAR ATR suffer from high computation cost and large memory footprint, making them unsuitable to be deployed on resource-limited platforms, such as small/micro satellites. In this paper, we propose a comprehensive GNN-based model-architecture co-design on FPGA to address the above issues. Model design: we design a novel graph neural network (GNN) for SAR ATR. The proposed GNN model incorporates GraphSAGE layer operators and attention mechanism, achieving comparable accuracy as the state-of-the-art work with near 1/100 computation cost. Then, we propose a pruning approach including weight pruning and input pruning. While weight pruning through lasso regression reduces most parameters without accuracy drop, input pruning eliminates most input pixels with negligible accuracy drop. Architecture design: to fully unleash the computation parallelism within the proposed model, we develop a novel unified hardware architecture that can execute various computation kernels (feature aggregation, feature transformation, graph pooling). The proposed hardware design adopts the Scatter-Gather paradigm to efficiently handle the irregular computation patterns of various computation kernels. We deploy the proposed design on an embedded FPGA (AMD Xilinx ZCU104) and evaluate the performance using MSTAR dataset. Compared with the state-of-the-art CNNs, the proposed GNN achieves comparable accuracy with 1/3258 computation cost and 1/83 model size. Compared with the state-of-the-art CPU/GPU, our FPGA accelerator achieves 14.8×/2.5× speedup (latency) and is 62×/39× more energy efficient.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125549968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}