2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)最新文献_第5页

A Framework for Neural Network Inference on FPGA-Centric SmartNICs 基于fpga的智能网卡神经网络推理框架

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00071

Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li, Martin C. Herbordt

{"title":"A Framework for Neural Network Inference on FPGA-Centric SmartNICs","authors":"Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li, Martin C. Herbordt","doi":"10.1109/FPL57034.2022.00071","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00071","url":null,"abstract":"FPGA-based SmartNICs offer great potential to significantly improve the performance of high-performance computing and warehouse data processing by tightly coupling support for reconfigurable data-intensive computation with cross-node communication thereby mitigating the von Neumann bottleneck. Existing work however has generally been limited in that it assumes an accelerator model where kernels are offloaded to SmartNICs with most control tasks left to the CPUs. This leads to frequent waiting reduced performance and scaling challenges. In this work we propose a new distributive data-centric computing framework named FCsN for reconfigurable SmartNIC-based systems. Through a lightweight task circulation execution model and its implementation architecture FCsN allows the complete detaching of NN kernel execution control logic system scheduling and network communication to the SmartNICs. This boosts performance by (i) avoiding control dependency with CPUs and (ii) supporting streaming NN kernel execution and network communication at line rate and in a very fine-grained manner. We demonstrate the efficiency and flexibility of FCsN using various types of neural network kernels and applications including deep neural networks (DNN) and graph neural networks (GNN) as these last are both irregular and data intensive they offer an especially robust demonstration. Evaluations using commonly-used neural network models and graph datasets show that a system with FCsN can achieve 10 × speedups over the MPI-based standard CPU baselines","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115200843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

GRAEBO: FPGA General Routing Architecture Exploration via Bayesian Optimization 基于贝叶斯优化的FPGA通用路由架构探索

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00050

Su Zheng, Jiadong Qian, Hao Zhou, Lingli Wang

引用次数: 1

Maia: Matrix Inversion Acceleration Near Memory Maia:矩阵反转加速近内存

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00049

Bahar Asgari, Dheeraj Ramchandani, Amaan Marfatia, Hyesoon Kim

引用次数: 0

A Flexible Real-Time Stereo Vision Architecture for Multiple Data Streams with Runtime Configurable Parameters 具有运行时可配置参数的多数据流的灵活实时立体视觉体系结构

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00024

Zhaoteng Meng, L. Shu, Jie Hao

{"title":"A Flexible Real-Time Stereo Vision Architecture for Multiple Data Streams with Runtime Configurable Parameters","authors":"Zhaoteng Meng, L. Shu, Jie Hao","doi":"10.1109/FPL57034.2022.00024","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00024","url":null,"abstract":"It is significant for a stereo vision real-time computing system to flexibly adapt to different parameters of stereo matching without re-customizing hardwares. In this paper, a configurable pipelined hardware architecture based on the sum of absolute differences (SAD) algorithm is proposed. We split the SAD calculation into two parts to accommodate pipelined computing. The architecture can be configured with different resolutions, window sizes, and disparity levels without stopping and restarting. In addition, it can be configured as a multiple-data-stream mode and we have developed a configuration gen-eration algorithm for the mode. The presented architecture is synthesized and implemented on a Xilinx ZCUI04 board. The evaluation results demonstrate that the real-time computing of 480P, 720P, and 1080P video streams can be process at 250MHz with the peak computing performance of 480P/784fps at the disparity level of 125. It uses 60% LUTs, 34% registers, and 39 % BRAM, producing flexible configurability and superior computing performance than the other similar work.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129068100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High Performance FPGA-based Post Quantum Cryptography Implementations 基于fpga的高性能后量子密码实现

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00076

Ziying Ni, A. Khalid, Máire O’Neill

引用次数: 1

Reducing FPGA Memory Footprint of Stencil Codes through Automatic Extraction of Memory Patterns 通过内存模式自动提取减少模板码的FPGA内存占用

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00033

Robert Szafarczyk, S. Nabi, W. Vanderbauwhede

{"title":"Reducing FPGA Memory Footprint of Stencil Codes through Automatic Extraction of Memory Patterns","authors":"Robert Szafarczyk, S. Nabi, W. Vanderbauwhede","doi":"10.1109/FPL57034.2022.00033","DOIUrl":"https://doi.org/10.1109/FPL57034.2022.00033","url":null,"abstract":"FPGAs are attractive for scientific high-performance computing due to their potential for high performance-per-Watt. Stencil codes in scientific applications are difficult to optimize on FPGAs, because of redundant, non-contiguous memory accesses to relatively low bandwidth DRAM. In this paper, we present an algorithm to aggressively reduce on-chip block RAM (BRAM) and off-chip DRAM utilisation of stencil codes running on FPGAs. The algorithm extracts memory accesses from computational pipelines and removes all redundant intermediate arrays, including those used for stencil buffering, by trading DRAM accesses for computation. The algorithm is based on rewrite-rules on a strict functional representation derived from Fortran code and generates provably correct, optimized code. Typical FPGA implementations store the stencil window in on-chip shift registers implemented in BRAMs; we use only DRAM and optimize the memory accesses instead. Our approach dramatically reduces BRAM usage so that the domain size is only limited by available DRAM. We report a drop of 78% and 18% in BRAM usage in 3-D and 2-D stencil codes compared to a manual implementation using shift registers while staying competitive in performance or even improving performance-per-Watt.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134177579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization auto - viti - acc:一种基于fpga的混合方案量化视觉变压器自动加速框架

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00027

Z. Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, M. Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

引用次数: 12

FPL Demo: Kyokko - An Aurora 64b66b compatible 100 Gbps Communication Controller FPL演示:Kyokko - Aurora 64b66b兼容的100 Gbps通信控制器

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00088

A. Tomori, Yasunori Osana

引用次数: 1

Speeding Up Optimal Modulo Scheduling with Rational Initiation Intervals 具有合理起始区间的加速最优模调度

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00056

Nicolai Fiege, Patrick Sittel, P. Zipf

引用次数: 2

FPL Demo: Logic Shrinkage: A Neural Architecture Search-Based Approach to FPGA Netlist Generation FPGA演示:逻辑收缩:一种基于神经结构搜索的FPGA网表生成方法

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI: 10.1109/FPL57034.2022.00086

Marie Auffret, Erwei Wang, James J. Davis

引用次数: 0