2018 International Conference on Field-Programmable Technology (FPT)最新文献

筛选
英文 中文
DP-Pack: Distributed Parallel Packing for FPGAs DP-Pack: fpga的分布式并行封装
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00054
Qiangpu Chen, Minghua Shen, Nong Xiao
{"title":"DP-Pack: Distributed Parallel Packing for FPGAs","authors":"Qiangpu Chen, Minghua Shen, Nong Xiao","doi":"10.1109/FPT.2018.00054","DOIUrl":"https://doi.org/10.1109/FPT.2018.00054","url":null,"abstract":"Packing is one of the most critical stages in the FPGA physical syntheses flow. In this paper, we propose DP-Pack, a distributed parallel packing approach. DP-Pack consists of two primary steps. First, all of the minimal circuit units are assigned into several subsets where the conflicting units are located in the same subset and the non-conflicting units are distributed in different subsets. Then, the non-conflicting subsets are partitioned by round robin such that the number of subsets in each processor core is equal approximately, leading to good load balance in parallel packing. Second, the parallelization between processor cores is implemented by the MPI-based message queue in a distributed platform. Note that DP-Pack has been integrated into the VTR 7.0 tool. Experimental results show that our DP-Pack scales to 8 processor cores to provide about 1.4~3.2× runtime advantages with acceptable quality degradation, comparing to the academic state-of-the-art AAPack.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116873208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Development of a Robot Car by Single Line Search Method for White Line Detection with FPGA 基于FPGA的单线搜索白线检测机器人小车的研制
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00088
Hiromichi Wakatsuki, T. Kido, K. Arai, Yuhei Sugata, K. Ootsu, T. Yokota, Takeshi Ohkawa
{"title":"Development of a Robot Car by Single Line Search Method for White Line Detection with FPGA","authors":"Hiromichi Wakatsuki, T. Kido, K. Arai, Yuhei Sugata, K. Ootsu, T. Yokota, Takeshi Ohkawa","doi":"10.1109/FPT.2018.00088","DOIUrl":"https://doi.org/10.1109/FPT.2018.00088","url":null,"abstract":"In level 5 autonomous driving system, image recognition is required as multiplex safety technology. However, real time image recognition is hard for existing microprocessors. Hence, implementation of driving system on FPGA is useful to achieve real time image recognition for autonomous driving. Therefore, this paper describes implementation of autonomous driving robot with image processing using FPGA. Hough transform which is generally used for white line detection, requires high computing cost. We explain our new white line detection method which features low computation cost. As a result of evaluation, image processing performance on software is about 26.1 frame / sec, and on hardware is about 0.5 frame / sec. In addition, hardware implementation using Vivado HLS is described.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123256920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An FPGA Realization of OpenPose Based on a Sparse Weight Convolutional Neural Network 基于稀疏加权卷积神经网络的OpenPose FPGA实现
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00061
Akira Jinguji, Tomoya Fujii, Shimpei Sato, Hiroki Nakahara
{"title":"An FPGA Realization of OpenPose Based on a Sparse Weight Convolutional Neural Network","authors":"Akira Jinguji, Tomoya Fujii, Shimpei Sato, Hiroki Nakahara","doi":"10.1109/FPT.2018.00061","DOIUrl":"https://doi.org/10.1109/FPT.2018.00061","url":null,"abstract":"The OpenPose is a kind of a deep learning based pose estimator which achieved a top accuracy for multiple person pose estimations. Even if using the OpenPose, it is necessary to used high-performance GPU since it requires massive parameters access with high-bandwidth off-chip GDDR5 memories and a higher operation clock frequency. Thus, the power consumption becomes a critical issue to realization. Also, its computation time is slower than the current video standard frame speed (29.97 FPS). In the paper, we introduce a sparse weight CNN to reduce the amount of memory size for weights, which is Then, we offer the indirect memory access architecture to realize the sparse CNN convolutional operation efficiently. Also, to increase throughput further, we applied the six stages of pipeline architecture with a pipeline buffer memory realization. Our implementation satisfied the timing constraint for real-time applications. Since our architecture computed an image with 42.6 msec, the number of frames per second (FPS) was 23.43. We measured the total board power consumption: It was 55 Watt. Thus, the performance per power efficiency was 0.444 (FPS/W). Compared with the NVidia Titan X Pascal architecture GPU, it was 3.49 times faster, it dissipated 3.54 times lower power, and its performance per power efficiency was 13.05 times better. As far as we know, this work is the first FPGA implementation of the OpenPose.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117200830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Development of a Control Target Recognition for Autonomous Vehicle Using FPGA with Python 基于Python的自动驾驶汽车控制目标识别系统的FPGA开发
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00089
Hiroki Bingo
{"title":"Development of a Control Target Recognition for Autonomous Vehicle Using FPGA with Python","authors":"Hiroki Bingo","doi":"10.1109/FPT.2018.00089","DOIUrl":"https://doi.org/10.1109/FPT.2018.00089","url":null,"abstract":"As an easy development of autonomous driving requiring enormous calculation and electric power, a scheme using FPGA is proposed. To reduce programming effort, a board enabling employment of Python is used, together with high-level libraries. The feasibility of algorithms (white line detection, human detection, etc.) on the FPGA board are investigated.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122949226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High-Speed Computation of CRC Codes for FPGAs fpga中CRC码的高速计算
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00042
Jakub Cabal, Lukás Kekely, J. Korenek
{"title":"High-Speed Computation of CRC Codes for FPGAs","authors":"Jakub Cabal, Lukás Kekely, J. Korenek","doi":"10.1109/FPT.2018.00042","DOIUrl":"https://doi.org/10.1109/FPT.2018.00042","url":null,"abstract":"As the throughput of networks and memory interfaces is on a constant rise, there is a need for ever-faster error-detecting codes. Cyclic redundancy checks (CRC) are a common and widely used to ensure consistency or detect accidental changes of data. We propose a novel FPGA architecture for the computation of the CRC designed for general high-speed data transfers. Its key feature is allowing a processing of multiple independent data packets (transactions) in each clock cycle, what is a necessity for achieving high overall throughput on very wide data buses. Experimental results confirm that the proposed architecture reaches an effective throughput sufficient for utilization in multi-terabit Ethernet networks (over 2 Tbps or over 3000 Mpps) on a single Xilinx UltraScale+ FPGA.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116031905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Implementing NEF Neural Networks on Embedded FPGAs 在嵌入式fpga上实现NEF神经网络
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00015
Benjamin Morcos, T. Stewart, C. Eliasmith, Nachiket Kapre
{"title":"Implementing NEF Neural Networks on Embedded FPGAs","authors":"Benjamin Morcos, T. Stewart, C. Eliasmith, Nachiket Kapre","doi":"10.1109/FPT.2018.00015","DOIUrl":"https://doi.org/10.1109/FPT.2018.00015","url":null,"abstract":"Low-power, high-speed neural networks are critical for providing deployable embedded AI applications at the edge. We describe an FPGA implementation of Neural Engineering Framework (NEF) networks with online learning that outperforms mobile GPU implementations by an order of magnitude or more. Specifically, we provide an embedded Python-capable PYNQ FPGA implementation supported with a High-Level Synthesis (HLS) workflow that allows sub-millisecond implementation of adaptive neural networks with low-latency, direct I/O access to the physical world. We tune the precision of the different intermediate variables in the code to achieve competitive absolute accuracy against slower and larger floating-point reference designs. The online learning component of the neural network exploits immediate feedback to adjust the network weights to best support a given arithmetic precision. As the space of possible design configurations of such networks is vast and is subject to a target accuracy constraint, we use the Hyperopt hyper-parameter tuning tool instead of manual search to find Pareto optimal designs. Specifically, we are able to generate the optimized designs in under 500 iterations of Vivado HLS before running the complete Vivado place-and-route phase on that subset. For neural network populations of 64-4096 neurons and 1-8 representational dimensions our optimized FPGA implementation generated by Hyperopt has a speedup of 10-484× over a competing cuBLAS implementation on the Jetson TX1 GPU while using 2.4-9.5× less power. Our speedups are a result of HLS-specific reformulation (15× improvement), precision adaptation (4× improvement), and low-latency direct I/O access (1000× improvement).","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128148201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Accelerating Top-k ListNet Training for Ranking Using FPGA 利用FPGA加速Top-k ListNet排序训练
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00044
Qiang Li, Shane T. Fleming, David B. Thomas, P. Cheung
{"title":"Accelerating Top-k ListNet Training for Ranking Using FPGA","authors":"Qiang Li, Shane T. Fleming, David B. Thomas, P. Cheung","doi":"10.1109/FPT.2018.00044","DOIUrl":"https://doi.org/10.1109/FPT.2018.00044","url":null,"abstract":"Document ranking is used to order query results by relevance, with different document ranking models providing trade-offs between ranking accuracy and training speed. ListNet is a well-known ranking approach which achieves high accuracy, but is infeasible in practice because training time is quadratic in the number of training documents. This paper considers the acceleration of ListNet training using FPGAs, and improves training speed by using hardware-oriented algorithmic optimisations, and by transforming algorithm structures to remove dependencies and expose parallelism. We implemented our approach on a Xilinx ultrascale FPGA board and applied it to the MQ 2008 benchmark dataset for ranking. Compared to existing ranking approaches ours shows an improvement from 0.29 to 0.33 in ranking accuracy on the same dataset using the NDCG@10 metric. Taking into account the communication between software and hardware, we are able to achieve a 3.21x speedup over an Intel Xeon1.6 GHz CPU implementation.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128225740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Message from the General Chair and Program Co-Chairs 来自总主委和项目联合主委的信息
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/fpt.2018.00005
{"title":"Message from the General Chair and Program Co-Chairs","authors":"","doi":"10.1109/fpt.2018.00005","DOIUrl":"https://doi.org/10.1109/fpt.2018.00005","url":null,"abstract":"","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"547 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133132418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compact Area and Performance Modelling for CGRA Architecture Evaluation CGRA建筑评估的紧凑面积和性能建模
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00028
Kuang-Ping Niu, J. Anderson
{"title":"Compact Area and Performance Modelling for CGRA Architecture Evaluation","authors":"Kuang-Ping Niu, J. Anderson","doi":"10.1109/FPT.2018.00028","DOIUrl":"https://doi.org/10.1109/FPT.2018.00028","url":null,"abstract":"We present area and performance models for use in coarse-grained reconfigurable array (CGRAs) architectural exploration. The area and performance models can be computed rapidly and are incorporated into the open-source CGRA-ME architecture evaluation framework. Area is modelled by synthesizing (into standard cells) commonly occurring CGRA primitives in isolation, and then aggregating the component-wise areas. For performance, we incorporate a fully fledged static-timing analysis (STA) framework into CGRA-ME. The delays in the STA timing graph are annotated based on: 1) a library component-wise delays for logic/memory, and 2) a fanout-based delay estimation model for interconnect. Performance and area are modelled for both performance-optimized and area-optimized standard-cell CGRA implementations. Accuracy of the area and performance models is within 7% and 10%, respectively, of a fully laid-out standard-cell CGRA implementation.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131274582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL 基于OpenCL的fpga高性能高精度浮点运算
2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00049
N. Nakasato, H. Daisaka, T. Ishikawa
{"title":"High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL","authors":"N. Nakasato, H. Daisaka, T. Ishikawa","doi":"10.1109/FPT.2018.00049","DOIUrl":"https://doi.org/10.1109/FPT.2018.00049","url":null,"abstract":"Development of high-level synthesis tools such as OpenCL SDK for FPGAs enables us to design accelerators for scientific applications that can take advantage of flexibility and efficiency of FPGAs. However, the available OpenCL SDKs only support the standard floating-point (FP) formats. In this paper, we present the performance evaluation of high precision FP operations, which are currently not supported in OpenCL, on recent FPGAs. By using a mechanism to call a custom design from an OpenCL kernel, we evaluate the performance of a sample application in high precision FP format binary128. We found that the sustained performance of our design in binary128 on Intel Arria10 and Stratix10 is 19 and 71 Gflops, respectively.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128124130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信