2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献_第8页

The Design Method of Logic Circuits based on the Voltage-Input Enhanced Scouting Logic Gates 基于电压输入增强侦察逻辑门的逻辑电路设计方法

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00031

Fan Liu, S. Zhang, Xiaole Cui

引用次数: 0

Direct Device-to-Device Physical Page Migrations in Multi-FPGA Shared Virtual Memory Systems 多fpga共享虚拟内存系统中的直接设备到设备物理页面迁移

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00043

Torben Kalkhof, A. Koch

{"title":"Direct Device-to-Device Physical Page Migrations in Multi-FPGA Shared Virtual Memory Systems","authors":"Torben Kalkhof, A. Koch","doi":"10.1109/FPL57034.2022.00043","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00043","url":null,"abstract":"Shared Virtual Memory (SVM) is a proven approach to simplify the programming of heterogeneous computing systems. It enables a single virtual address space across all computing devices, even for systems having Non-Uniform Memory Accesses (NUMA) across devices. Access time spikes due to NUMA can be reduced, though, by performing physical page migrations in SVM. These migrations ensure high data locality by moving the underlying memory pages close to the computing device currently working on the contained data, and allow the devices to fault-in pages from remote to local memories autonomously. The main contribution of this work is the implementation of an open-source framework enabling scalable SVM for multi-FPGA architectures, and providing efficient device-to-device page migrations. We compare the runtime of on-demand and user-managed migrations, and examine three different communication mechanisms for the actual board-to-board data transfers. Our framework supports both low-latency and high-throughput operations, requiring, e.g., only 11.6 μs to migrate a single 4 kB page between physical memories on different boards, and 760 μs to migrate an entire 4 MB range of memory.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121074435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure FSHMEM:支持fpga上的分区全局地址空间，用于大规模硬件加速基础设施

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-07-11 DOI: 10.1109/FPL57034.2022.00042

Yashael F. Arthanto, David Ojika, Joo-Young Kim

{"title":"FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure","authors":"Yashael F. Arthanto, David Ojika, Joo-Young Kim","doi":"10.1109/FPL57034.2022.00042","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00042","url":null,"abstract":"By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5×. It records 0.35us and 0.59us latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel's deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94× and 1.98× speedup for matrix multiplication and convolution operation, respectively, showing its scalability notential for HPC infrastructure.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129157516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture H-GCN:基于通用ACAP架构的图卷积网络加速器

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-28 DOI: 10.1109/FPL57034.2022.00040

Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin C. Herbordt, Ang Li, Dingwen Tao

{"title":"H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture","authors":"Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin C. Herbordt, Ang Li, Dingwen Tao","doi":"10.1109/FPL57034.2022.00040","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00040","url":null,"abstract":"Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs. Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph typologies. Existing efforts, however, have focused mainly on handling graphs' irregularity and have not studied their heterogeneity. To this end we propose H-GCN, a PL (Programmable Logic) and AIE (AI Engine) based hybrid accelerator that leverages the emerging heterogeneity of Xilinx Versal Adaptive Compute Acceleration Platforms (ACAPs) to achieve high-performance GNN inference. In particular, H-GCN partitions each graph into three subgraphs based on its inherent heterogeneity, and processes them using PL and AIE, respectively. To further improve performance, we explore the sparsity support of AIE and develop an efficient density-aware method to automatically map tiles of sparse matrix-matrix multiplication (SpMM) onto the systolic tensor array. Compared with state-of-the-art GCN accelerators, H-GCN achieves, on average, speedups of 1.1~2.3x.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122231722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

EmuNoC: Hybrid Emulation for Fast and Flexible Network-on-Chip Prototyping on FPGAs EmuNoC:基于fpga的快速灵活的片上网络原型混合仿真

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-23 DOI: 10.1109/FPL57034.2022.00058

Y. Y. Tan, Felix Staudigl, Lukas Jünger, Anna Drewes, R. Leupers, J. Joseph

引用次数: 0

Real-Time Waveform Matching with a Digitizer at 10 GS/s 用数字化仪以10gs /s的速度实时匹配波形

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-21 DOI: 10.1109/FPL57034.2022.00025

Jens Trautmann, Nikolaos Patsiatzis, Andreas Becher, J. Teich, S. Wildermann

{"title":"Real-Time Waveform Matching with a Digitizer at 10 GS/s","authors":"Jens Trautmann, Nikolaos Patsiatzis, Andreas Becher, J. Teich, S. Wildermann","doi":"10.1109/FPL57034.2022.00025","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00025","url":null,"abstract":"Side-Channel Analysis (SCA) requires the detection of the specific time frame within which Cryptographic Operations (COs) take place in the side-channel signal. In laboratory conditions with full control over the Device under Test (DuT), dedicated trigger signals can be implemented to indicate the start and end of COs. For real-world scenarios, waveform-matching techniques have been established which compare the side-channel signal with a template of the CO's pattern in real time to detect the CO in the side channel. State-of-the-art approaches are implemented on Field-Programmable Gate Arrays (FPGAs). However, current waveform-matching designs process the samples from Analog-to-Digital Converters (ADCs) sequentially and can only work with low sampling rates due to the limited clock speed of FPGAs. This makes it increasingly difficult to apply existing techniques on modern DuTs that operate with clock speeds in the GHz range. In this paper, we present a parallel waveform-matching architecture that is capable of performing waveform matching at the speed of fast ADCs. We implement the proposed architecture in a high-end FPGA-based digitizer and deploy it to detect AES COs from the side channel of a single-board computer operating at 1 GHz. Our implementation allows for waveform matching at 10 GS/s with high accuracy, thus offering a speedup of 50× compared to the fastest state-of-the-art implementation known to us.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124706207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs GraphScale: fpga上可扩展的带宽高效图形处理

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-16 DOI: 10.1109/FPL57034.2022.00016

Jonas Dann, Daniel Ritter, H. Fröning

引用次数: 3

Half Title Page 半页标题

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-01 DOI: 10.1109/fpl57034.2022.00001

引用次数: 0

DSP-Packing: Squeezing Low-precision Arithmetic into FPGA DSP Blocks DSP封装:将低精度算法压缩到FPGA DSP块中

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-03-21 DOI: 10.1109/FPL57034.2022.00035

J. Sommer, Akif ¨Ozkan, Member Ieee Oliver Keszocze, Fellow Ieee J¨urgen Teich

{"title":"DSP-Packing: Squeezing Low-precision Arithmetic into FPGA DSP Blocks","authors":"J. Sommer, Akif ¨Ozkan, Member Ieee Oliver Keszocze, Fellow Ieee J¨urgen Teich","doi":"10.1109/FPL57034.2022.00035","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00035","url":null,"abstract":"The number of Digital Signal Processor (DSP) resources available in Field Programmable Gate Arrays (FPGAs) is often quite limited. Therefore, full utilization of available DSP resources for the computationally intensive parts of an algorithm is paramount for optimizing the non-functional properties of an implementation (i.e., performance, power, and area). The DSPs available in Xilinx devices implement large bit width operators (i.e. a 48-bit accumulator or a 18 × 27 multiplier). However, using such a DSP for low-precision quantized data (as is common in image processing or machine learning applications) leaves the DSP resources underutilized. As a remedy, a method has been proposed to pack and compute four 4-bit multiplications on a single DSP in a single clock cycle. This paper presents a generalization of this scheme to arbitrary bit widths and number of multiplications. We also demonstrate that the previously proposed approach leads to errors (Mean Absolute Error (MAE) = 0.37). Furthermore, we explain where these errors come from and how they can be corrected. On top, we introduce a novel approximate method called “Overpacking” which allows to squeeze even more multiplications into a single DSP at the cost of small errors (MAE = 0.47). Overpacking allows to squeeze six 4-bit multiplications into a single DSP compared to just four in the literature. Finally, we introduce an alternative method for packing multiple small-bit width additions into a single 48-bit accumulator for use in applications such as Spiking Neural Networks.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures SAMO:卷积神经网络到流架构的优化映射

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2021-11-30 DOI: 10.1109/FPL57034.2022.00069

Alexander Montgomerie-Corcoran, Zhewen Yu, C. Bouganis

{"title":"SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures","authors":"Alexander Montgomerie-Corcoran, Zhewen Yu, C. Bouganis","doi":"10.1109/FPL57034.2022.00069","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00069","url":null,"abstract":"Significant effort has been placed on the development of toolflows that map Convolutional Neural Network (CNN) models to Field Programmable Gate Arrays (FPGAs) with the aim of automating the production of high performance designs for a diverse set of applications. However, within these toolflows, the problem of finding an optimal mapping is often overlooked, with the expectation that the end user will tune their generated hardware for their desired platform. This is particularly prominent within Streaming Architecture toolflows, where there is a large design space to be explored. In this work, we establish the framework SAMO: a Streaming Architecture Mapping Optimiser. SAMO exploits the structure of CNN models and the common features that exist in Streaming Architectures, and casts the mapping optimisation problem under a unified methodology. Furthermore, SAMO explicitly explores the re-configurability property of FPGAs, allowing the methodology to overcome mapping limitations imposed by certain toolflows under resource-constrained scenarios, as well as improve on the achievable throughput. Three optimisation methods - Brute-Force, Simulated Annealing and Rule-Based - have been developed in order to generate valid, high performance designs for a range of target platforms and CNN models. Results show that SAMO-optimised designs can achieve 4x-20x better performance compared to existing hand-tuned designs. The SAMO framework is open-source: https://github.com/AlexMontgomerie/samo.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125084949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8