2019 International Conference on Field-Programmable Technology (ICFPT)最新文献_第2页

A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization 基于结构化剪枝和归一化线性量化的高效fpga LSTM加速器结构设计

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00045

Yong Zheng, Haigang Yang, Zhihong Huang, Tianli Li, Yiping Jia

{"title":"A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization","authors":"Yong Zheng, Haigang Yang, Zhihong Huang, Tianli Li, Yiping Jia","doi":"10.1109/ICFPT47387.2019.00045","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00045","url":null,"abstract":"LSTM (Long Short-Term Memory) is an artificial recurrent neural network (RNN) architecture and has been successfully applied to the areas where sequences of data need to be dealt with such as Natural Language Processing (NLP), speech recognition, etc. In this work, we explore an avenue to minimization of the LSTM inference part design based on FPGA for high performance and energy-efficiency. First, the model is pruned to create structured sparse features for the hardware-friendly purpose by using permuted block diagonal mask matrices. To further compress the model, we quantize the weights and activations following a normalized linear quantization approach. As a result, computational activities of the network are significantly deducted with an egligible loss on accuracy. Then a hardware architecture design has been devised to fully exploit the benefits of regular sparse structure. Having been implemented on Arria 10 (10AX115U4F45I3SG) FPGA running at 150 MHz, our accelerator demonstrates a peak performance of 2.22 TOPS at a power dissipation of 1.679 Watts. In comparison to the other FPGA-based LSTM accelerator designs previously reported, our approach achieves a 1.17-2.16x speedup in processing.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122511652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

OBFS: OpenCL Based BFS Optimizations on Software Programmable FPGAs 基于OpenCL的软件可编程fpga的BFS优化

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00056

Cheng Liu, Xinyu Chen, Bingsheng He, Xiaofei Liao, Ying Wang, Lei Zhang

{"title":"OBFS: OpenCL Based BFS Optimizations on Software Programmable FPGAs","authors":"Cheng Liu, Xinyu Chen, Bingsheng He, Xiaofei Liao, Ying Wang, Lei Zhang","doi":"10.1109/ICFPT47387.2019.00056","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00056","url":null,"abstract":"Breadth First Search (BFS) is a key building block of graph processing and there have been considerable efforts devoted to accelerating BFS on FPGAs for both performance and energy efficiency. Prior work typically built the BFS accelerator through handcrafted circuit design using hardware description language (HDL). Despite the relatively good performance, the HDL based design leads to extremely low design productivity, and incurs high portability and maintenance cost. While high level synthesis (HLS) tools make it convenient to create a functionally correct BFS accelerator, the performance can be much lower the handcrafted design with HDL. To obtain both the near handcrafted design performance and better software-like features such as portability and maintenance, we propose OBFS, an OpenCL based BFS accelerator on software programmable FPGAs. With the observation that OpenCL based FPGA design is rather inefficient on irregular memory accesses, we propose approaches including data alignment, graph reordering and batching to ensure coalesced memory accesses. In addition, we take advantage of the on-chip buffer to mitigate the inefficient random DDR accesses. Finally, we shift the random level update in BFS out from the main processing pipeline and have it overlapped with the following BFS processing task. According to the experiments, OBFS achieves 9.5X and 5.5X performance speedup on average compared to a vertex-centric implementation and an edge-centric implementation respectively on Intel Harp-v2. When compared to prior handcrafted designs, it achieves comparable or even better performance.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131912040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Time-SWAD: A Dataflow Engine for Time-Based Single Window Stream Aggregation Time-SWAD:基于时间的单窗口流聚合的数据流引擎

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00017

Prajith Ramakrishnan Geethakumari, Vincenzo Gulisano, P. Trancoso, I. Sourdis

{"title":"Time-SWAD: A Dataflow Engine for Time-Based Single Window Stream Aggregation","authors":"Prajith Ramakrishnan Geethakumari, Vincenzo Gulisano, P. Trancoso, I. Sourdis","doi":"10.1109/ICFPT47387.2019.00017","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00017","url":null,"abstract":"High throughput and low latency streaming aggregation is essential for many applications that analyze massive volumes of data in real-time. Incoming data need to be stored in a single sliding window before processing, in cases where incremental aggregations are wasteful or not possible at all; this puts tremendous pressure to the memory bandwidth. In addition, particular problems call for time-based windows, defined by a time-interval, where the amount of data per window may vary and as a consequence are more challenging to handle. This paper describes Time-SWAD, the first accelerator for time-based single-window stream aggregation. Time-SWAD is a dataflow engine (DFE), implemented on a Maxeler machine, offering high processing throughput, up to 150 Mtuples/sec, similar to related GPU systems, which however do not support both time-based and single windows. It uses a direct feed of incoming data from the network and has direct access to off-chip DRAM, enabling ultra-low processing latency of 1-10 µsec, at least 4 orders of magnitude lower than software-based solutions.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Image Processing and Vehicles – Using FPGA to Reduce Latency of Time Critical Tasks 图像处理和车辆-使用FPGA减少时间关键任务的延迟

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00097

A. Yeo, Damon Hill, Anzhen Huang, Xueao Liu, G. Dong, D. Bailey

引用次数: 0

Autonomous Vehicle Driving Using the Stream-Based Real-Time Hardware Line Detector 基于流的实时硬件线路检测器的自动驾驶汽车

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00093

Taito Manabe, Naofumi Yoshinaga, Yuta Imamura, Taichi Saikai, Koki Fujita, Masatomo Matsuda, Kotoko Miyata, Tatsuma Mori, Yuichiro Shibata, H. Egawa, Yuichi Kawamata, Tomohiro Kida, Ryouhei Tsugami, Ryohei Kakizaki, Taichi Katayama, Koki Tomonaga, Shota Fukui

{"title":"Autonomous Vehicle Driving Using the Stream-Based Real-Time Hardware Line Detector","authors":"Taito Manabe, Naofumi Yoshinaga, Yuta Imamura, Taichi Saikai, Koki Fujita, Masatomo Matsuda, Kotoko Miyata, Tatsuma Mori, Yuichiro Shibata, H. Egawa, Yuichi Kawamata, Tomohiro Kida, Ryouhei Tsugami, Ryohei Kakizaki, Taichi Katayama, Koki Tomonaga, Shota Fukui","doi":"10.1109/ICFPT47387.2019.00093","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00093","url":null,"abstract":"To achieve the level 5 autonomous driving, which enables a totally driver-less vehicle, image recognition ability that is close to the human level is essential, since most information required for safe driving is currently provided as visual information, such as traffic lanes and signs. Though the image recognition includes various technologies, we focus on line detection in this paper, which can be used especially for lane keeping. To achieve real-time line detection with lower latency and power consumption, we prefer stream-based hardware implementation using an FPGA. A line segment detector (LSD) is an algorithm for line detection based on intensity gradient, and is better than the well-known Hough transform in terms of processing speed and accuracy. However, to implement the LSD on FPGAs in a stream manner is difficult due to its iterative approach. Therefore, we propose a simple and stream-friendly line detection algorithm based on the LSD. Evaluation results reveal that the implemented system is compact while maintaining 60 fps throughput for VGA moving images. We also introduce other components to be used to build an autonomous driving system in this paper.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128514119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient OS Hardware Accelerators Preemption Management in FPGA FPGA中高效的OS硬件加速器抢占管理

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00069

Ye Tian, Jean-Christophe Prévotet, F. Nouvel

引用次数: 1

Autonomous Vehicle Development Using FPGA for Image Processing 利用FPGA进行图像处理的自动驾驶汽车开发

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00090

Hamish Simmonds, Nicholas Carlisle, Xue Li, Fanglin Mu, D. Bailey

引用次数: 3

Hybrid Network Utilization for Efficient Communication in a Tightly Coupled FPGA Cluster 紧耦合FPGA集群中高效通信的混合网络利用

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00068

Tomohiro Ueno, Takaaki Miyajima, Antoniette Mondigo, K. Sano

引用次数: 4

An OpenCL-Based Hybrid CNN-RNN Inference Accelerator On FPGA 基于opencl的FPGA混合CNN-RNN推理加速器

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00048

Yunfei Sun, Brian Liu, Xianchao Xu

{"title":"An OpenCL-Based Hybrid CNN-RNN Inference Accelerator On FPGA","authors":"Yunfei Sun, Brian Liu, Xianchao Xu","doi":"10.1109/ICFPT47387.2019.00048","DOIUrl":"https://doi.org/10.1109/ICFPT47387.2019.00048","url":null,"abstract":"Recently, Convolution Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and CNN-RNN hybrid networks have demonstrated great success in many deep learning scenarios. Although many dedicated FPGA accelerators for a certain kind of network have been proposed, few of them combine CNN and RNN acceleration together. In this paper we propose a high-throughput and resource-efficient CNN-RNN fusion accelerator on FPGA with commercial OpenCL to support general-purpose DNNs. It utilizes a novel streaming architecture and mapping strategy to implement the most computationintensive and resource-demanding parts in DNNs on the same computation logic. By such a hardware reuse method, it realizes resource efficiency in accelerating CNNs, RNNs and their hybrid networks. Our accelerator follows a layer-by-layer, subgraph-by-subgraph or subnetwork-by-subnetwork execution mode, which facilities it to deploy most DNNs flexibly during runtime with best performance. YOLOv2, LSTM and CRNN are tested with our work on Intel Arria10 GX1150 FPGA. It achieves 646 GOPS throughput on CRNN, which is the best performance on CNNRNN hybrid networks among high-level-synthesis (HLS) based FPGA accelerators. Moreover, its throughput for CNNs and RNNs is competitive to the state-of-the-art specialized FPGA accelerators.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116564538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Machine Learning Approach for Power Gating the FPGA Routing Network FPGA路由网络电源门控的机器学习方法

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI: 10.1109/ICFPT47387.2019.00010

Zeinab Seifoori, H. Asadi, Mirjana Stojilović

引用次数: 5