2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献

筛选
英文 中文
The Design Method of Logic Circuits based on the Voltage-Input Enhanced Scouting Logic Gates 基于电压输入增强侦察逻辑门的逻辑电路设计方法
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00031
Fan Liu, S. Zhang, Xiaole Cui
{"title":"The Design Method of Logic Circuits based on the Voltage-Input Enhanced Scouting Logic Gates","authors":"Fan Liu, S. Zhang, Xiaole Cui","doi":"10.1109/FPL57034.2022.00031","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00031","url":null,"abstract":"The Enhanced Scouting Logic (ESL) is a memristive logic gate family with low sensitivity to resistance variation and high device endurance. This work studies the design methods of logic circuits based on the Voltage-Input Enhanced Scouting Logic (VIESL) gates. Both the single-array and dual-array synthesis methods are proposed. The read/write separation technique of VIESL gates facilitates the pipelined logic operations. The synthesis results on the benchmarks show that the circuit generated by the proposed single-array synthesis method has the best performance compared with that of its counterparts, and the dual-array synthesis method reduces the cell counts effectively.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127284428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direct Device-to-Device Physical Page Migrations in Multi-FPGA Shared Virtual Memory Systems 多fpga共享虚拟内存系统中的直接设备到设备物理页面迁移
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00043
Torben Kalkhof, A. Koch
{"title":"Direct Device-to-Device Physical Page Migrations in Multi-FPGA Shared Virtual Memory Systems","authors":"Torben Kalkhof, A. Koch","doi":"10.1109/FPL57034.2022.00043","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00043","url":null,"abstract":"Shared Virtual Memory (SVM) is a proven approach to simplify the programming of heterogeneous computing systems. It enables a single virtual address space across all computing devices, even for systems having Non-Uniform Memory Accesses (NUMA) across devices. Access time spikes due to NUMA can be reduced, though, by performing physical page migrations in SVM. These migrations ensure high data locality by moving the underlying memory pages close to the computing device currently working on the contained data, and allow the devices to fault-in pages from remote to local memories autonomously. The main contribution of this work is the implementation of an open-source framework enabling scalable SVM for multi-FPGA architectures, and providing efficient device-to-device page migrations. We compare the runtime of on-demand and user-managed migrations, and examine three different communication mechanisms for the actual board-to-board data transfers. Our framework supports both low-latency and high-throughput operations, requiring, e.g., only 11.6 μs to migrate a single 4 kB page between physical memories on different boards, and 760 μs to migrate an entire 4 MB range of memory.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121074435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure FSHMEM:支持fpga上的分区全局地址空间,用于大规模硬件加速基础设施
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-07-11 DOI: 10.1109/FPL57034.2022.00042
Yashael F. Arthanto, David Ojika, Joo-Young Kim
{"title":"FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure","authors":"Yashael F. Arthanto, David Ojika, Joo-Young Kim","doi":"10.1109/FPL57034.2022.00042","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00042","url":null,"abstract":"By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5×. It records 0.35us and 0.59us latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel's deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94× and 1.98× speedup for matrix multiplication and convolution operation, respectively, showing its scalability notential for HPC infrastructure.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129157516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture H-GCN:基于通用ACAP架构的图卷积网络加速器
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-28 DOI: 10.1109/FPL57034.2022.00040
Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin C. Herbordt, Ang Li, Dingwen Tao
{"title":"H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture","authors":"Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin C. Herbordt, Ang Li, Dingwen Tao","doi":"10.1109/FPL57034.2022.00040","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00040","url":null,"abstract":"Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs. Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph typologies. Existing efforts, however, have focused mainly on handling graphs' irregularity and have not studied their heterogeneity. To this end we propose H-GCN, a PL (Programmable Logic) and AIE (AI Engine) based hybrid accelerator that leverages the emerging heterogeneity of Xilinx Versal Adaptive Compute Acceleration Platforms (ACAPs) to achieve high-performance GNN inference. In particular, H-GCN partitions each graph into three subgraphs based on its inherent heterogeneity, and processes them using PL and AIE, respectively. To further improve performance, we explore the sparsity support of AIE and develop an efficient density-aware method to automatically map tiles of sparse matrix-matrix multiplication (SpMM) onto the systolic tensor array. Compared with state-of-the-art GCN accelerators, H-GCN achieves, on average, speedups of 1.1~2.3x.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122231722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
EmuNoC: Hybrid Emulation for Fast and Flexible Network-on-Chip Prototyping on FPGAs EmuNoC:基于fpga的快速灵活的片上网络原型混合仿真
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-23 DOI: 10.1109/FPL57034.2022.00058
Y. Y. Tan, Felix Staudigl, Lukas Jünger, Anna Drewes, R. Leupers, J. Joseph
{"title":"EmuNoC: Hybrid Emulation for Fast and Flexible Network-on-Chip Prototyping on FPGAs","authors":"Y. Y. Tan, Felix Staudigl, Lukas Jünger, Anna Drewes, R. Leupers, J. Joseph","doi":"10.1109/FPL57034.2022.00058","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00058","url":null,"abstract":"Networks-on-Chips (NoCs) recently became widely used, from multi-core CPUs to edge-AI accelerators. Emulation on FPGAs promises to accelerate their RTL modeling compared to slow simulations. However, realistic test stimuli are challenging to generate in hardware for diverse applications. In other words, both a fast and flexible design framework is required. The most promising solution is hybrid emulation, in which parts of the design are simulated in software, and the other parts are emulated in hardware. This paper proposes a novel hybrid emulation framework called EmuNoC. We introduce a clock-synchronization method and software-only packet generation that improves the emulation speed by 36.3 × to 79.3 × over state-of-the-art frameworks while retaining the flexibility of a pure-software interface for stimuli simulation. We also increased the area efficiency to model up to an NoC with 169 routers on a single FPGA, while previous frameworks only achieved 64 routers.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time Waveform Matching with a Digitizer at 10 GS/s 用数字化仪以10gs /s的速度实时匹配波形
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-21 DOI: 10.1109/FPL57034.2022.00025
Jens Trautmann, Nikolaos Patsiatzis, Andreas Becher, J. Teich, S. Wildermann
{"title":"Real-Time Waveform Matching with a Digitizer at 10 GS/s","authors":"Jens Trautmann, Nikolaos Patsiatzis, Andreas Becher, J. Teich, S. Wildermann","doi":"10.1109/FPL57034.2022.00025","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00025","url":null,"abstract":"Side-Channel Analysis (SCA) requires the detection of the specific time frame within which Cryptographic Operations (COs) take place in the side-channel signal. In laboratory conditions with full control over the Device under Test (DuT), dedicated trigger signals can be implemented to indicate the start and end of COs. For real-world scenarios, waveform-matching techniques have been established which compare the side-channel signal with a template of the CO's pattern in real time to detect the CO in the side channel. State-of-the-art approaches are implemented on Field-Programmable Gate Arrays (FPGAs). However, current waveform-matching designs process the samples from Analog-to-Digital Converters (ADCs) sequentially and can only work with low sampling rates due to the limited clock speed of FPGAs. This makes it increasingly difficult to apply existing techniques on modern DuTs that operate with clock speeds in the GHz range. In this paper, we present a parallel waveform-matching architecture that is capable of performing waveform matching at the speed of fast ADCs. We implement the proposed architecture in a high-end FPGA-based digitizer and deploy it to detect AES COs from the side channel of a single-board computer operating at 1 GHz. Our implementation allows for waveform matching at 10 GS/s with high accuracy, thus offering a speedup of 50× compared to the fastest state-of-the-art implementation known to us.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124706207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs GraphScale: fpga上可扩展的带宽高效图形处理
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-16 DOI: 10.1109/FPL57034.2022.00016
Jonas Dann, Daniel Ritter, H. Fröning
{"title":"GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs","authors":"Jonas Dann, Daniel Ritter, H. Fröning","doi":"10.1109/FPL57034.2022.00016","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00016","url":null,"abstract":"Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels. In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. For the first time, Graph-Scale combines multi-channel memory with asynchronous graph processing (i. e., for fast convergence on results) and a com-pressed graph representation (i. e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly -connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Half Title Page 半页标题
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-06-01 DOI: 10.1109/fpl57034.2022.00001
{"title":"Half Title Page","authors":"","doi":"10.1109/fpl57034.2022.00001","DOIUrl":"https://doi.org/10.1109/fpl57034.2022.00001","url":null,"abstract":"","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130615196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSP-Packing: Squeezing Low-precision Arithmetic into FPGA DSP Blocks DSP封装:将低精度算法压缩到FPGA DSP块中
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-03-21 DOI: 10.1109/FPL57034.2022.00035
J. Sommer, Akif ¨Ozkan, Member Ieee Oliver Keszocze, Fellow Ieee J¨urgen Teich
{"title":"DSP-Packing: Squeezing Low-precision Arithmetic into FPGA DSP Blocks","authors":"J. Sommer, Akif ¨Ozkan, Member Ieee Oliver Keszocze, Fellow Ieee J¨urgen Teich","doi":"10.1109/FPL57034.2022.00035","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00035","url":null,"abstract":"The number of Digital Signal Processor (DSP) resources available in Field Programmable Gate Arrays (FPGAs) is often quite limited. Therefore, full utilization of available DSP resources for the computationally intensive parts of an algorithm is paramount for optimizing the non-functional properties of an implementation (i.e., performance, power, and area). The DSPs available in Xilinx devices implement large bit width operators (i.e. a 48-bit accumulator or a 18 × 27 multiplier). However, using such a DSP for low-precision quantized data (as is common in image processing or machine learning applications) leaves the DSP resources underutilized. As a remedy, a method has been proposed to pack and compute four 4-bit multiplications on a single DSP in a single clock cycle. This paper presents a generalization of this scheme to arbitrary bit widths and number of multiplications. We also demonstrate that the previously proposed approach leads to errors (Mean Absolute Error (MAE) = 0.37). Furthermore, we explain where these errors come from and how they can be corrected. On top, we introduce a novel approximate method called “Overpacking” which allows to squeeze even more multiplications into a single DSP at the cost of small errors (MAE = 0.47). Overpacking allows to squeeze six 4-bit multiplications into a single DSP compared to just four in the literature. Finally, we introduce an alternative method for packing multiple small-bit width additions into a single 48-bit accumulator for use in applications such as Spiking Neural Networks.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures SAMO:卷积神经网络到流架构的优化映射
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2021-11-30 DOI: 10.1109/FPL57034.2022.00069
Alexander Montgomerie-Corcoran, Zhewen Yu, C. Bouganis
{"title":"SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures","authors":"Alexander Montgomerie-Corcoran, Zhewen Yu, C. Bouganis","doi":"10.1109/FPL57034.2022.00069","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00069","url":null,"abstract":"Significant effort has been placed on the development of toolflows that map Convolutional Neural Network (CNN) models to Field Programmable Gate Arrays (FPGAs) with the aim of automating the production of high performance designs for a diverse set of applications. However, within these toolflows, the problem of finding an optimal mapping is often overlooked, with the expectation that the end user will tune their generated hardware for their desired platform. This is particularly prominent within Streaming Architecture toolflows, where there is a large design space to be explored. In this work, we establish the framework SAMO: a Streaming Architecture Mapping Optimiser. SAMO exploits the structure of CNN models and the common features that exist in Streaming Architectures, and casts the mapping optimisation problem under a unified methodology. Furthermore, SAMO explicitly explores the re-configurability property of FPGAs, allowing the methodology to overcome mapping limitations imposed by certain toolflows under resource-constrained scenarios, as well as improve on the achievable throughput. Three optimisation methods - Brute-Force, Simulated Annealing and Rule-Based - have been developed in order to generate valid, high performance designs for a range of target platforms and CNN models. Results show that SAMO-optimised designs can achieve 4x-20x better performance compared to existing hand-tuned designs. The SAMO framework is open-source: https://github.com/AlexMontgomerie/samo.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125084949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信