2019 International Conference on Field-Programmable Technology (ICFPT)最新文献_第7页

Pipelined Parallel Finite Automata Evaluation 流水线并行有限自动机评估

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00021

Vipula Sateesh, Connor Mckeon, Jared Winograd, A. DeHon

引用次数: 5

A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System 端到端说话人识别系统中基于fpga的高性能加速器设计

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00033

Ming-jun Jiao, Yue Li, Pengbo Dang, Wei Cao, Lingli Wang

{"title":"A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System","authors":"Ming-jun Jiao, Yue Li, Pengbo Dang, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT47387.2019.00033","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00033","url":null,"abstract":"Speaker recognition technique is significant for identification applications. X-vectors, a robust text-independent speaker recognition system, spends plenty of time on extracting voiceprint features due to massive neural network computation and scoring with all the people registered in the database to find the best match person. In this paper, an FPGA-based high-performance accelerator for this end-to-end speaker recognition system is proposed, which contains three parts: Mel Frequency Cepstral Coefficients (MFCC), time delay neural network (TDNN) and probabilistic linear discriminant analysis (PLDA) classifier. A quantitative analysis is presented to balance the bit width and the recognition accuracy. In addition, an optimization strategy to make a trade-off between the system parallelism and the FPGA resource utilization is introduced. As a comparison, the proposed accelerator running on Xilinx XCVU9P FPGA of UltraScale+ VCU118 board can achieve a peak performance of 1.067 TOP/s and 1.30×105 voice frames per second (vFPS) with 200MHz, which can obtain 1296× speedup compared with X-vectors software implementation running on a 2.5GHz Intel Xeon E5-2620 processor and 6.42× energy efficiency than Nvidia TITAN Xp GPU solution.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133723388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

ZyNet: Automating Deep Neural Network Implementation on Low-Cost Reconfigurable Edge Computing Platforms ZyNet:在低成本可重构边缘计算平台上实现深度神经网络自动化

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00058

Kizheppatt Vipin

引用次数: 4

High-Level Synthesis Techniques to Generate Deeply Pipelined Circuits for FPGAs with Registered Routing 基于注册路由的fpga深度流水线电路的高级综合技术

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00071

Yu Ting Chen, Jin Hee Kim, Ke-Xin Li, Graham Hoyes, J. Anderson

引用次数: 6

Evolved Binary Neural Networks Through Harnessing FPGA Capabilities 利用FPGA功能的进化二进制神经网络

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00076

Raul Valencia, Chiu-Wing Sham, O. Sinnen

引用次数: 3

An Iterative Technique for Runtime Efficient Hardware-Software Partitioning 一种运行时高效的软硬件分区迭代技术

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00078

Deshya Wijesundera, Kisaru Liyanage, Alok Prakash, T. Srikanthan, Thilina Perera

引用次数: 1

A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA 基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00044

Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara

{"title":"A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA","authors":"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara","doi":"10.1109/ICFPT47387.2019.00044","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00044","url":null,"abstract":"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130141110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimisation of System Throughput Exploiting Tasks Heterogeneity on Space Shared FPGAs 利用空间共享fpga的任务异构性优化系统吞吐量

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00067

U. Minhas, R. Woods, G. Karakonstantis

引用次数: 1

An OpenCL-Based FPGA Accelerator for Compressed YOLOv2 基于opencl的压缩YOLOv2 FPGA加速器

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00036

Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang

{"title":"An OpenCL-Based FPGA Accelerator for Compressed YOLOv2","authors":"Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang","doi":"10.1109/ICFPT47387.2019.00036","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00036","url":null,"abstract":"Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Storage Mirroring for Bare-Metal Malware Analysis on FPGA Devices 基于FPGA器件的裸金属恶意软件分析存储镜像

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00061

D. C. Turicu, O. Creţ, L. Văcariu

引用次数: 1