The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第3页

Interactive Debugging at IP Block Interfaces in FPGAs fpga中IP块接口的交互式调试

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439305

M. Merlini, Isamu Poy, P. Chow

引用次数: 2

AutoSA AutoSA

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439292

Jie Wang, Licheng Guo, J. Cong

引用次数: 5

Top-down Physical Design of Soft Embedded FPGA Fabrics 软嵌入式FPGA结构自顶向下物理设计

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439297

P. Mohan, Oguz Atli, Onur O. Kibar, Mohammed Zackriya, Larry Pileggi, K. Mai

{"title":"Top-down Physical Design of Soft Embedded FPGA Fabrics","authors":"P. Mohan, Oguz Atli, Onur O. Kibar, Mohammed Zackriya, Larry Pileggi, K. Mai","doi":"10.1145/3431920.3439297","DOIUrl":"https://doi.org/10.1145/3431920.3439297","url":null,"abstract":"In recent years, IC reverse engineering and IC fabrication supply chain security have grown to become significant economic and security threats for designers, system integrators, and end customers. Many of the existing logic locking and obfuscation techniques have shown to be vulnerable to attack once the attacker has access to the design netlist either through reverse engineering or through an untrusted fabrication facility. We introduce soft embedded FPGA redaction, a hardware obfuscation approach that allows the designer substitute security-critical IP blocks within a design with a synthesizable eFPGA fabric. This method fully conceals the logic and the routing of the critical IP and is compatible with standard ASIC flows for easy integration and process portability. To demonstrate eFPGA redaction, we obfuscate a RISC-V control path and a GPS P-code generator. We also show that the modified netlists are resilient to SAT attacks with moderate VLSI overheads. The secure RISC-V design has 1.89x area and 2.36x delay overhead while the GPS design has 1.39x area and negligible delay overhead when implemented on an industrial 22nm FinFET CMOS process.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129690504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators MOCHA:基于加速器的异构云多节点成本优化

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439304

Peipei Zhou, Jiayi Sheng, Cody Hao Yu, Peng Wei, Jie Wang, Di Wu, J. Cong

引用次数: 10

Folded Integer Multiplication for FPGAs fpga的折叠整数乘法

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439299

M. Langhammer, B. Pasca

{"title":"Folded Integer Multiplication for FPGAs","authors":"M. Langhammer, B. Pasca","doi":"10.1145/3431920.3439299","DOIUrl":"https://doi.org/10.1145/3431920.3439299","url":null,"abstract":"Encryption - especially the key exchange algorithms such as RSA - is an increasing use-model for FPGAs, driven by the adoption of the FPGA as a SmartNIC in the datacenter. While bulk encryption such as AES maps well to generic FPGA features, the very large multipliers required for RSA are a much more difficult problem. Although FPGAs contain thousands of small integer multipliers in DSP Blocks, aggregating them into very large multipliers is very challenging because of the large amount of soft logic required - especially in the form of long adders, and the high embedded multiplier count. In this paper, we describe a large multiplier architecture that operates in a multi-cycle format and which has a linear area/throughput ratio. We show results for a 2048-bit multiplier that has a latency of 118 cycles, inputs data every 9th cycle and closes timing at 377MHz in an Intel Arria 10 FPGA, and over 400MHz in a Stratix 10. The proposed multiplier uses 1/9 of the DSP resources typically used in a 2048-bit Karatsuba implementation, showing a perfectly linear throughput to DSP-count ratio. Our proposed solution outperforms recently reported results, in either arithmetic complexity - by making use of the Karatsuba techniques, or in scheduling efficiency - embedded DSP resources are fully utilized.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131437732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Probabilistic Optimization for High-Level Synthesis 高阶综合的概率优化

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439455

Jianyi Cheng, John Wickerson, G. Constantinides

引用次数: 0

Simulating and Evaluating a Quaternary Logic FPGA Based on Floating-gate Memories and Voltage Division 基于浮门存储和分压的四元逻辑FPGA的仿真与评价

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439471

Ayokunle Fadamiro, Pouyan Rezaie, S. Millican, Christopher Harris

{"title":"Simulating and Evaluating a Quaternary Logic FPGA Based on Floating-gate Memories and Voltage Division","authors":"Ayokunle Fadamiro, Pouyan Rezaie, S. Millican, Christopher Harris","doi":"10.1145/3431920.3439471","DOIUrl":"https://doi.org/10.1145/3431920.3439471","url":null,"abstract":"Technology scaling cannot meet consumer demands, especially for binary circuits. Previous studies proposed addressing this with multi-valued logic (MVL) architectures, but these architectures use non-standard fabrication techniques and optimistic performance analysis. This study presents a new quaternary FPGA (QFPGA) architecture based on floating-gate memories that standard CMOS fabrication can fabricate: programming floating-gates implement a voltage divider, and these divided voltages represent one of four distinct logic values. When simulated with open-source FinFET SPICE models, the proposed architecture obtains competitive delay and power performance compared to equivalent binary and QFPGA architectures from literature. Results show the proposed QFPGA basic logic element (BLE) requires half the area and dissipates a third of the power density compared to QFPGA architectures from literature. When projecting BLE performance onto benchmark circuits, implementing circuits requires up to 55% less area and one-third the power, and the proposed architecture can operate at clock speeds up to three times faster than binary equivalents. Future studies will investigate accurate modeling of interconnects to better account for their performance impacts and will explore efficient architectures for programming MVL memories when they're used in FPGAs.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131731301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA SWIFT:基于小世界的结构剪枝在FPGA上加速DNN推理

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439465

Yufei Ma, Gokul Krishnan, Yu Cao, Le Ye, Ru Huang

{"title":"SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA","authors":"Yufei Ma, Gokul Krishnan, Yu Cao, Le Ye, Ru Huang","doi":"10.1145/3431920.3439465","DOIUrl":"https://doi.org/10.1145/3431920.3439465","url":null,"abstract":"State-of-the-art DNN pruning approaches achieved high sparsity. However, these methods usually do not consider the intrinsic graph property of DNNs, leading to an irregular pruned network. Consequently, hardware accelerators cannot directly benefit from such pruning, suffering additional cost on indexing, control and data paths. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique, SWIFT, that integrates local clusters and global sparsity in DNNs to benefit the dataflow and workload balance of the accelerators. In particular, we propose an output stationary FPGA architecture to accelerate DNN inference and integrate it with the structural sparsity by SWIFT, so that the communication and computation of clustered zero weights are eliminated. In addition, a full mesh data router is designed to adaptively direct inputs into corresponding processing elements (PEs) for different layer configurations and skipping zero operations. The proposed SWIFT is evaluated with multiple DNNs on different datasets. It achieves sparsity ratio up to 76% for CIFAR-10, 83% for CIFAR-100, 76% for the SVHN datasets. Moreover, our proposed SWIFT FPGA accelerator achieves up to 4.4× improvement in throughput for different dense networks with a marginal hardware overhead.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"31 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129954669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

NPE: An FPGA-based Overlay Processor for Natural Language Processing NPE:基于fpga的自然语言处理覆盖处理器

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439477

H. Khan, Asma Khan, Zainab F. Khan, L. Huang, Kun Wang, Lei He

{"title":"NPE: An FPGA-based Overlay Processor for Natural Language Processing","authors":"H. Khan, Asma Khan, Zainab F. Khan, L. Huang, Kun Wang, Lei He","doi":"10.1145/3431920.3439477","DOIUrl":"https://doi.org/10.1145/3431920.3439477","url":null,"abstract":"In recent years, transformer-based models have shown state-of-the-art results for Natural Language Processing (NLP). In particular, the introduction of the BERT language model brought with it breakthroughs in tasks such as question answering and natural language inference, advancing applications that allow humans to interact naturally with embedded devices. FPGA-based overlay processors have been shown as effective solutions for edge image and video processing applications, which mostly rely on low precision linear matrix operations. In contrast, transformer-based NLP techniques employ a variety of higher precision nonlinear operations with significantly higher frequency. We present NPE, an FPGA-based overlay processor that can efficiently execute a variety of NLP models. NPE offers software-like programmability to the end user and, unlike FPGA designs that implement specialized accelerators for each nonlinear function, can be upgraded for future NLP models without requiring reconfiguration. NPE can meet real-time conversational AI latency targets for the BERT language model with 4x lower power than CPUs and 6x lower power than GPUs. We also show NPE uses 3x fewer FPGA resources relative to comparable BERT network-specific accelerators in the literature. NPE provides a cost-effective and power-efficient FPGA-based solution for Natural Language Processing at the edge.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132589038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking 通过微基准测试为软件程序员揭开现代数据中心fpga存储系统的神秘面纱

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI: 10.1145/3431920.3439284

Alec Lu, Zhenman Fang, Weihua Liu, Lesley Shannon

{"title":"Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking","authors":"Alec Lu, Zhenman Fang, Weihua Liu, Lesley Shannon","doi":"10.1145/3431920.3439284","DOIUrl":"https://doi.org/10.1145/3431920.3439284","url":null,"abstract":"With the public availability of FPGAs from major cloud service providers like AWS, Alibaba, and Nimbix, hardware and software developers can now easily access FPGA platforms. However, it is nontrivial to develop efficient FPGA accelerators, especially for software programmers who use high-level synthesis (HLS). The major goal of this paper is to figure out how to efficiently access the memory system of modern datacenter FPGAs in HLS-based accelerator designs. This is especially important for memory-bound applications; for example, a naive accelerator design only utilizes less than 5% of the available off-chip memory bandwidth. To achieve our goal, we first identify a comprehensive set of factors that affect the memory bandwidth, including 1) the number of concurrent memory access ports, 2) the data width of each port, 3) the maximum burst access length for each port, and 4) the size of consecutive data accesses. Then we carefully design a set of HLS-based microbenchmarks to quantitatively evaluate the performance of the Xilinx Alveo U200 and U280 FPGA memory systems when changing those affecting factors, and provide insights into efficient memory access in HLS-based accelerator designs. To demonstrate the usefulness of our insights, we also conduct two case studies to accelerate the widely used K-nearest neighbors (KNN) and sparse matrix-vector multiplication (SpMV) algorithms. Compared to the baseline designs, optimized designs leveraging our insights achieve about 3.5x and 8.5x speedups for the KNN and SpMV accelerators.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"231 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134089154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15