2018 International Conference on Field-Programmable Technology (FPT)最新文献

An Area-Efficient Out-of-Order Soft-Core Processor Without Register Renaming 无寄存器重命名的区域高效乱序软核处理器

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00077

J. Kadomoto, Toru Koizumi, A. Fukuda, Reoma Matsuo, Susumu Mashimo, Akifumi Fujita, Ryota Shioya, H. Irie, S. Sakai

引用次数: 3

Injecting FPGA Configuration Faults in Parallel 并行注入FPGA配置故障

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00037

Shane T. Fleming, David B. Thomas

{"title":"Injecting FPGA Configuration Faults in Parallel","authors":"Shane T. Fleming, David B. Thomas","doi":"10.1109/FPT.2018.00037","DOIUrl":"https://doi.org/10.1109/FPT.2018.00037","url":null,"abstract":"When using SRAM-based FPGA devices in safety critical applications testing against bitflips in the device configuration memory is essential. Often such tests are achieved by corrupting configuration memory bits of a running device, but this has many scalability, reliability, and flexibility challenges. In this paper, we present a framework and a concrete implementation of a parallel fault injection cluster that addresses these challenges. Scalability is addressed by using multiple identical FPGA devices, each testing a different region in parallel. Reliability is addressed by using reconfigurable system-on-chip devices, that are isolated from each other. Flexibility is addressed by using a pending commit structure, that continually checkpoints the overall experiment and allows elastic scaling. We test and showcase our approach by exhaustively flipping every bit in the configuration memory of the CHStone benchmark suite and a VivadoHLS generated k-means clustering image processing application. Our results show that: linear scaling is possible as the number of devices increases; the majority of error inducing bitflips in the k-means application do not significantly impact the output; and that the Xilinx Essential bits tool may miss some bits that can induce errors.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127520015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Tatum: Parallel Timing Analysis for Faster Design Cycles and Improved Optimization 并行时序分析，更快的设计周期和改进的优化

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00026

Kevin E. Murray, Vaughn Betz

引用次数: 15

Evaluating The Highly-Pipelined Intel Stratix 10 FPGA Architecture Using Open-Source Benchmarks 使用开源基准测试评估高度流水线化的Intel Stratix 10 FPGA架构

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00038

Tian Tan, E. Nurvitadhi, D. Shih, Derek Chiou

{"title":"Evaluating The Highly-Pipelined Intel Stratix 10 FPGA Architecture Using Open-Source Benchmarks","authors":"Tian Tan, E. Nurvitadhi, D. Shih, Derek Chiou","doi":"10.1109/FPT.2018.00038","DOIUrl":"https://doi.org/10.1109/FPT.2018.00038","url":null,"abstract":"Intel Stratix 10 FPGAs offer a novel architectural feature called HyperFlex that enables an extreme degree of pipelining resulting in up to 1GHz clock frequencies. Prior work has evaluated HyperFlex on pre-production Stratix 10 FPGAs using internal designs not accessible to the general public. This paper presents an updated evaluation of HyperFlex on the latest publicly-available production Stratix 10 FPGA using open-source benchmarks. In particular, our evaluation started with seven RTL designs from existing open-source projects, carefully chosen to capture a variety of architectures (simple pipeline to pipeline with loop/M20Ks/DSPs) implementing well-known functions such as crypto, math, and image processing. An FPGA developer who was not an expert in HyperFlex then spent around 250 engineering hours to develop 24 optimized versions of these designs, following the Intel Stratix 10 FPGA HyperFlex optimization guide. Those optimized designs run at 400MHz to 850MHz. In this paper, we describe the optimizations, efforts, and results. Upon publication, those optimized designs will be open-sourced and published in GitHub.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116379729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Synthesizable Heterogeneous FPGA Fabrics 可合成异构FPGA结构

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00040

Brett Grady, J. Anderson

引用次数: 15

Memory-Efficient Architecture for Accelerating Generative Networks on FPGA 基于FPGA加速生成网络的高效内存架构

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00016

Shuanglong Liu, Chenglong Zeng, Hongxiang Fan, Ho-Cheung Ng, Jiuxi Meng, Zhiqiang Que, Xinyu Niu, W. Luk

{"title":"Memory-Efficient Architecture for Accelerating Generative Networks on FPGA","authors":"Shuanglong Liu, Chenglong Zeng, Hongxiang Fan, Ho-Cheung Ng, Jiuxi Meng, Zhiqiang Que, Xinyu Niu, W. Luk","doi":"10.1109/FPT.2018.00016","DOIUrl":"https://doi.org/10.1109/FPT.2018.00016","url":null,"abstract":"Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks: a generative network (generator) and a discriminative network (discriminator). These two networks compete with each other to perform better at their respective tasks. The generator is typically a deconvolutional neural network and the discriminator is a convolutional neural network (CNN). Deconvolution performs a fundamentally new type of mathematical operation which differs from convolution. While the FPGA-based CNN accelerators have been widely studied in prior work, the acceleration of deconvolutional networks on FPGA is rarely explored. This paper proposes a novel parametrized deconvolutional architecture based on an FPGA-friendly method, in contrast to the transposed convolution implementation in CPUs and GPUs. Hardware design templates which map this architecture to FPGAs are provided with configurable deconvolutional layer parameters. Furthermore, a memory-efficient architecture with a new tiling method is proposed to accelerate the generator of GANs, by storing all intermediate data in on-chip memories and significantly reducing off-chip data transfers. The performance of the proposed accelerator is evaluated using a variety of GANs on a Xilinx Zynq 706 board, which shows 2.3x higher speed and 8.2x off-chip memory access reduction than an optimized Vanilla FPGA design. Compared to the respective implementations on CPUs and GPUs, the achieved improvements are in the range of 30x-92x in speed over an Intel 8-core i7-950 CPU, and 8x-108x in terms of Performance-per-Watt over an NVIDIA Titan X GPU.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122146938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

DaCO: A High-Performance Token Dataflow Coprocessor Overlay for FPGAs fpga的高性能令牌数据流协处理器覆盖层

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00032

Siddhartha, Nachiket Kapre

{"title":"DaCO: A High-Performance Token Dataflow Coprocessor Overlay for FPGAs","authors":"Siddhartha, Nachiket Kapre","doi":"10.1109/FPT.2018.00032","DOIUrl":"https://doi.org/10.1109/FPT.2018.00032","url":null,"abstract":"Dataflow computing architectures exploit dynamic parallelism at the fine granularity of individual operations and provide a pathway to overcome the performance and energy limits of conventional von Neumann models. In this vein, we present DaCO (Dataflow Coprocessor FPGA Overlay), a high-performance compute organization for FPGAs to deliver up to 2.5x speedup over existing dataflow alternatives. Historically, dataflow-style execution has been viewed as an attractive parallel computing paradigm due to the self-timed, decentralized nature of implementation of dataflow dependencies and an absence of sequential program counters. However, realising high-performance dataflow computers has remained elusive largely due to the complexity of scheduling this parallelism and data communication bottlenecks. DaCO achieves this by (1) supporting large-scale (1000s of nodes) out-of-order scheduling using hierarchical lookup, (2) priority-aware routing of dataflow dependencies using the efficient Hoplite-Q NoC, and (3) clustering techniques to exploit data locality in the communication network organization. Each DaCO processing element is a programmable soft processor and it communicates with others using a packet-switching network-on-chip (PSNoC). We target the Arria 10 AX115S FPGA to take advantage of the hard floating-point DSP blocks, and maximize performance by multipumping the M20K Block RAMs. Overall, we can scale DaCO to 450 processors operating at an fmax of 250 MHz on the target platform. Each soft processor consumes 779 ALMs, 4 M20K BRAMs, and 3 hard floating-point DSP blocks for optimum balance, while the on-chip communication framework consumes < 15% of the on-chip resources.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPGA Architecture Enhancements for Efficient BNN Implementation 有效实现BNN的FPGA架构增强

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00039

Jin Hee Kim, Jongeun Lee, J. Anderson

引用次数: 8

A Nearest Neighbor Search Engine Using Distance-Based Hashing 基于距离哈希的最近邻搜索引擎

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00031

Toshitaka Ito, Yuri Itotani, S. Wakabayashi, Shinobu Nagayama, Masato Inagi

引用次数: 4

Software-Specified FPGA Accelerators for Elementary Functions 用于基本功能的软件指定FPGA加速器

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00019

J. Chen, Xue Liu, J. Anderson

引用次数: 0