2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献

筛选
英文 中文
A Framework for Neural Network Inference on FPGA-Centric SmartNICs 基于fpga的智能网卡神经网络推理框架
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00071
Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li, Martin C. Herbordt
{"title":"A Framework for Neural Network Inference on FPGA-Centric SmartNICs","authors":"Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li, Martin C. Herbordt","doi":"10.1109/FPL57034.2022.00071","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00071","url":null,"abstract":"FPGA-based SmartNICs offer great potential to significantly improve the performance of high-performance computing and warehouse data processing by tightly coupling support for reconfigurable data-intensive computation with cross-node communication thereby mitigating the von Neumann bottleneck. Existing work however has generally been limited in that it assumes an accelerator model where kernels are offloaded to SmartNICs with most control tasks left to the CPUs. This leads to frequent waiting reduced performance and scaling challenges. In this work we propose a new distributive data-centric computing framework named FCsN for reconfigurable SmartNIC-based systems. Through a lightweight task circulation execution model and its implementation architecture FCsN allows the complete detaching of NN kernel execution control logic system scheduling and network communication to the SmartNICs. This boosts performance by (i) avoiding control dependency with CPUs and (ii) supporting streaming NN kernel execution and network communication at line rate and in a very fine-grained manner. We demonstrate the efficiency and flexibility of FCsN using various types of neural network kernels and applications including deep neural networks (DNN) and graph neural networks (GNN) as these last are both irregular and data intensive they offer an especially robust demonstration. Evaluations using commonly-used neural network models and graph datasets show that a system with FCsN can achieve 10 × speedups over the MPI-based standard CPU baselines","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115200843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
GRAEBO: FPGA General Routing Architecture Exploration via Bayesian Optimization 基于贝叶斯优化的FPGA通用路由架构探索
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00050
Su Zheng, Jiadong Qian, Hao Zhou, Lingli Wang
{"title":"GRAEBO: FPGA General Routing Architecture Exploration via Bayesian Optimization","authors":"Su Zheng, Jiadong Qian, Hao Zhou, Lingli Wang","doi":"10.1109/FPL57034.2022.00050","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00050","url":null,"abstract":"Modern FPGAs utilize complex routing architectures to optimize the area, critical path delay, and power consumption. General Routing Block (GRB) models the routing resources of modern FPGAs, enabling the design of better routing architectures than previous academic FPGAs based on the CB-SB model. However, the design space of the GRB model is too large to be explored manually. In this paper, we propose GRAEBO, a design space exploration (DSE) algorithm for FPGA routing architectures based on Bayesian optimization, which can optimize and accelerate the DSE by balancing exploration and exploitation. Moreover, we design pruning rules to further improve the DSE efficiency, which can serve as a multi-fidelity acceleration method. GRAEBO obtains better area, delay, and area-delay product than a 142-channel baseline CB-SB architecture, with improvements of 8%, 19%, and 26%, respectively. Compared to the GRB architecture found by the simulated annealing algorithm, GRAEBO achieves 9% smaller area, 5% shorter delay, and 13% better area-delay product on the VTR benchmarks.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122774202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Maia: Matrix Inversion Acceleration Near Memory Maia:矩阵反转加速近内存
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00049
Bahar Asgari, Dheeraj Ramchandani, Amaan Marfatia, Hyesoon Kim
{"title":"Maia: Matrix Inversion Acceleration Near Memory","authors":"Bahar Asgari, Dheeraj Ramchandani, Amaan Marfatia, Hyesoon Kim","doi":"10.1109/FPL57034.2022.00049","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00049","url":null,"abstract":"Matrix inversion is an essential and challenging operation in several application domains, such as scientific computing, social networks, and recommendation systems. Since matrix inversion is a memory-bound task, it has the potential of being implemented near memory to efficiently use high memory bandwidth. However, data-dependency patterns in the common matrix-inversion algorithms limit memory bandwidth utilization. To minimize the negative impact of such dependencies on performance, we propose matrix inversion acceleration (Maia), a near-memory FPGA-based implementation of matrix inversion that converts the mathematical dependencies to gate-level dependencies thus reduces the critical-path latency. We implement and evaluate Maia on a high-end Xilinx Ultrascale+ xcu280 FPGA connected to a high-bandwidth memory (HBM2), targeting the data-center Alveo U280 boards. Maia performs matrix inversion 4 x faster than a baseline FPGA implementation without the proposed techniques for resolving dependencies.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131194292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Flexible Real-Time Stereo Vision Architecture for Multiple Data Streams with Runtime Configurable Parameters 具有运行时可配置参数的多数据流的灵活实时立体视觉体系结构
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00024
Zhaoteng Meng, L. Shu, Jie Hao
{"title":"A Flexible Real-Time Stereo Vision Architecture for Multiple Data Streams with Runtime Configurable Parameters","authors":"Zhaoteng Meng, L. Shu, Jie Hao","doi":"10.1109/FPL57034.2022.00024","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00024","url":null,"abstract":"It is significant for a stereo vision real-time computing system to flexibly adapt to different parameters of stereo matching without re-customizing hardwares. In this paper, a configurable pipelined hardware architecture based on the sum of absolute differences (SAD) algorithm is proposed. We split the SAD calculation into two parts to accommodate pipelined computing. The architecture can be configured with different resolutions, window sizes, and disparity levels without stopping and restarting. In addition, it can be configured as a multiple-data-stream mode and we have developed a configuration gen-eration algorithm for the mode. The presented architecture is synthesized and implemented on a Xilinx ZCUI04 board. The evaluation results demonstrate that the real-time computing of 480P, 720P, and 1080P video streams can be process at 250MHz with the peak computing performance of 480P/784fps at the disparity level of 125. It uses 60% LUTs, 34% registers, and 39 % BRAM, producing flexible configurability and superior computing performance than the other similar work.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129068100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Performance FPGA-based Post Quantum Cryptography Implementations 基于fpga的高性能后量子密码实现
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00076
Ziying Ni, A. Khalid, Máire O’Neill
{"title":"High Performance FPGA-based Post Quantum Cryptography Implementations","authors":"Ziying Ni, A. Khalid, Máire O’Neill","doi":"10.1109/FPL57034.2022.00076","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00076","url":null,"abstract":"Post-quantum Cryptography (PQC) is an umbrella term for cryptographic schemes based on hard mathematical problems which are resistant to attacks by quantum computers. The National Institute of Standards and Technology (NIST) initiated a PQC standardisation process in 2017, with a total of 4 algorithms selected for standardisation after round 3 and 4 undertaken for further analysis in Round 4 in 2022. PQC schemes on hardware devices, such as Field Programmable Gate Arrays (FPGA), show the potential of higher throughput performance, for comparable security, at the cost of high area and power consumption. The major aim of this thesis is to help facilitate the global transition to a post quantum secure set of security protocols. This thesis will focus on the optimisation of the the hardware architectures to improve the computational speed and reduce the area overhead. The side channel analysis vulnerabilities and their countermeasures will also be studied.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129083845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing FPGA Memory Footprint of Stencil Codes through Automatic Extraction of Memory Patterns 通过内存模式自动提取减少模板码的FPGA内存占用
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00033
Robert Szafarczyk, S. Nabi, W. Vanderbauwhede
{"title":"Reducing FPGA Memory Footprint of Stencil Codes through Automatic Extraction of Memory Patterns","authors":"Robert Szafarczyk, S. Nabi, W. Vanderbauwhede","doi":"10.1109/FPL57034.2022.00033","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00033","url":null,"abstract":"FPGAs are attractive for scientific high-performance computing due to their potential for high performance-per-Watt. Stencil codes in scientific applications are difficult to optimize on FPGAs, because of redundant, non-contiguous memory accesses to relatively low bandwidth DRAM. In this paper, we present an algorithm to aggressively reduce on-chip block RAM (BRAM) and off-chip DRAM utilisation of stencil codes running on FPGAs. The algorithm extracts memory accesses from computational pipelines and removes all redundant intermediate arrays, including those used for stencil buffering, by trading DRAM accesses for computation. The algorithm is based on rewrite-rules on a strict functional representation derived from Fortran code and generates provably correct, optimized code. Typical FPGA implementations store the stencil window in on-chip shift registers implemented in BRAMs; we use only DRAM and optimize the memory accesses instead. Our approach dramatically reduces BRAM usage so that the domain size is only limited by available DRAM. We report a drop of 78% and 18% in BRAM usage in 3-D and 2-D stencil codes compared to a manual implementation using shift registers while staying competitive in performance or even improving performance-per-Watt.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134177579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization auto - viti - acc:一种基于fpga的混合方案量化视觉变压器自动加速框架
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00027
Z. Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, M. Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang
{"title":"Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization","authors":"Z. Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, M. Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang","doi":"10.1109/FPL57034.2022.00027","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00027","url":null,"abstract":"Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.47% to 1.36% higher Top-l accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6x improvement on the frame rate (i.e., 56.8 FPS vs. 10.0 FPS) with 0.71% accuracy drop on ImageNet dataset for DeiT-base.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114763110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
FPL Demo: Kyokko - An Aurora 64b66b compatible 100 Gbps Communication Controller FPL演示:Kyokko - Aurora 64b66b兼容的100 Gbps通信控制器
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00088
A. Tomori, Yasunori Osana
{"title":"FPL Demo: Kyokko - An Aurora 64b66b compatible 100 Gbps Communication Controller","authors":"A. Tomori, Yasunori Osana","doi":"10.1109/FPL57034.2022.00088","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00088","url":null,"abstract":"Kyokko is an open, vendor-independent implementation of Xilinx's Aurora 64b66b protocol. It provides the interoperability of both Xilinx and Intel FPGAs over high-speed serial links such as optical, coaxial, or SFP cables. Currently, it works on Kintex, Virtex, Cyclone, and Arria FPGAs with less resource requirements and latency than Xilinx's Aurora 64b66b core. We'll make an on-line demonstration of connectivity among these FPGAs.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122997203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speeding Up Optimal Modulo Scheduling with Rational Initiation Intervals 具有合理起始区间的加速最优模调度
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00056
Nicolai Fiege, Patrick Sittel, P. Zipf
{"title":"Speeding Up Optimal Modulo Scheduling with Rational Initiation Intervals","authors":"Nicolai Fiege, Patrick Sittel, P. Zipf","doi":"10.1109/FPL57034.2022.00056","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00056","url":null,"abstract":"Compared to integer initiation intervals (II), rational IIs improve throughput achieved by loop pipelining in many cases. This comes at the expense of a higher need for data path elements (i.e., multiplexers and registers) and the need for solving more complex scheduling problems. To optimally solve these problems, we improved an existing ILP formulation for latency-optimal modulo scheduling with rational IIs that now finds 6.08x more solutions and 6.10x as many optimal ones within the same time budget. Compared to the best alternative from previous work, our improved algorithm finds 1.15x more solutions and 2.97x as many optimal ones.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121930221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPL Demo: Logic Shrinkage: A Neural Architecture Search-Based Approach to FPGA Netlist Generation FPGA演示:逻辑收缩:一种基于神经结构搜索的FPGA网表生成方法
2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00086
Marie Auffret, Erwei Wang, James J. Davis
{"title":"FPL Demo: Logic Shrinkage: A Neural Architecture Search-Based Approach to FPGA Netlist Generation","authors":"Marie Auffret, Erwei Wang, James J. Davis","doi":"10.1109/FPL57034.2022.00086","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00086","url":null,"abstract":"Logic shrinkage is an open-source, state-of-the-art neural architecture search (NAS)-based approach to the automated design of DNN inference accelerators that ideally suit FPGA deployment [1], [2]. Where NAS traditionally sees candidate functions such as convolutions automatically evaluated and selected between to form a network, logic shrinkage operates at ultra-fine granularity, resulting in a netlist of LUTs as the topology. Our results for datasets of complexity ranging from MNIST to ImageNet show area and energy efficiency gains vs binary neural networks (BNNs) of up to ~6 x and ~ lOx.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114546139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信