2018 International Conference on Field-Programmable Technology (FPT)最新文献_第5页

DP-Pack: Distributed Parallel Packing for FPGAs DP-Pack: fpga的分布式并行封装

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00054

Qiangpu Chen, Minghua Shen, Nong Xiao

引用次数: 1

Development of a Robot Car by Single Line Search Method for White Line Detection with FPGA 基于FPGA的单线搜索白线检测机器人小车的研制

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00088

Hiromichi Wakatsuki, T. Kido, K. Arai, Yuhei Sugata, K. Ootsu, T. Yokota, Takeshi Ohkawa

引用次数: 2

An FPGA Realization of OpenPose Based on a Sparse Weight Convolutional Neural Network 基于稀疏加权卷积神经网络的OpenPose FPGA实现

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00061

Akira Jinguji, Tomoya Fujii, Shimpei Sato, Hiroki Nakahara

{"title":"An FPGA Realization of OpenPose Based on a Sparse Weight Convolutional Neural Network","authors":"Akira Jinguji, Tomoya Fujii, Shimpei Sato, Hiroki Nakahara","doi":"10.1109/FPT.2018.00061","DOIUrl":"https://doi.org/10.1109/FPT.2018.00061","url":null,"abstract":"The OpenPose is a kind of a deep learning based pose estimator which achieved a top accuracy for multiple person pose estimations. Even if using the OpenPose, it is necessary to used high-performance GPU since it requires massive parameters access with high-bandwidth off-chip GDDR5 memories and a higher operation clock frequency. Thus, the power consumption becomes a critical issue to realization. Also, its computation time is slower than the current video standard frame speed (29.97 FPS). In the paper, we introduce a sparse weight CNN to reduce the amount of memory size for weights, which is Then, we offer the indirect memory access architecture to realize the sparse CNN convolutional operation efficiently. Also, to increase throughput further, we applied the six stages of pipeline architecture with a pipeline buffer memory realization. Our implementation satisfied the timing constraint for real-time applications. Since our architecture computed an image with 42.6 msec, the number of frames per second (FPS) was 23.43. We measured the total board power consumption: It was 55 Watt. Thus, the performance per power efficiency was 0.444 (FPS/W). Compared with the NVidia Titan X Pascal architecture GPU, it was 3.49 times faster, it dissipated 3.54 times lower power, and its performance per power efficiency was 13.05 times better. As far as we know, this work is the first FPGA implementation of the OpenPose.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117200830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Development of a Control Target Recognition for Autonomous Vehicle Using FPGA with Python 基于Python的自动驾驶汽车控制目标识别系统的FPGA开发

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00089

Hiroki Bingo

引用次数: 2

High-Speed Computation of CRC Codes for FPGAs fpga中CRC码的高速计算

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00042

Jakub Cabal, Lukás Kekely, J. Korenek

引用次数: 4

Implementing NEF Neural Networks on Embedded FPGAs 在嵌入式fpga上实现NEF神经网络

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00015

Benjamin Morcos, T. Stewart, C. Eliasmith, Nachiket Kapre

{"title":"Implementing NEF Neural Networks on Embedded FPGAs","authors":"Benjamin Morcos, T. Stewart, C. Eliasmith, Nachiket Kapre","doi":"10.1109/FPT.2018.00015","DOIUrl":"https://doi.org/10.1109/FPT.2018.00015","url":null,"abstract":"Low-power, high-speed neural networks are critical for providing deployable embedded AI applications at the edge. We describe an FPGA implementation of Neural Engineering Framework (NEF) networks with online learning that outperforms mobile GPU implementations by an order of magnitude or more. Specifically, we provide an embedded Python-capable PYNQ FPGA implementation supported with a High-Level Synthesis (HLS) workflow that allows sub-millisecond implementation of adaptive neural networks with low-latency, direct I/O access to the physical world. We tune the precision of the different intermediate variables in the code to achieve competitive absolute accuracy against slower and larger floating-point reference designs. The online learning component of the neural network exploits immediate feedback to adjust the network weights to best support a given arithmetic precision. As the space of possible design configurations of such networks is vast and is subject to a target accuracy constraint, we use the Hyperopt hyper-parameter tuning tool instead of manual search to find Pareto optimal designs. Specifically, we are able to generate the optimized designs in under 500 iterations of Vivado HLS before running the complete Vivado place-and-route phase on that subset. For neural network populations of 64-4096 neurons and 1-8 representational dimensions our optimized FPGA implementation generated by Hyperopt has a speedup of 10-484× over a competing cuBLAS implementation on the Jetson TX1 GPU while using 2.4-9.5× less power. Our speedups are a result of HLS-specific reformulation (15× improvement), precision adaptation (4× improvement), and low-latency direct I/O access (1000× improvement).","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128148201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Accelerating Top-k ListNet Training for Ranking Using FPGA 利用FPGA加速Top-k ListNet排序训练

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00044

Qiang Li, Shane T. Fleming, David B. Thomas, P. Cheung

引用次数: 1

Message from the General Chair and Program Co-Chairs 来自总主委和项目联合主委的信息

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/fpt.2018.00005

引用次数: 0

Compact Area and Performance Modelling for CGRA Architecture Evaluation CGRA建筑评估的紧凑面积和性能建模

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00028

Kuang-Ping Niu, J. Anderson

引用次数: 8

High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL 基于OpenCL的fpga高性能高精度浮点运算

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00049

N. Nakasato, H. Daisaka, T. Ishikawa

引用次数: 2