2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)最新文献_第3页

Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA 基于FPGA的在线决策树学习的高效可扩展加速研究

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00032

Zhe Lin, Sharad Sinha, Wei Zhang

{"title":"Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA","authors":"Zhe Lin, Sharad Sinha, Wei Zhang","doi":"10.1109/FCCM.2019.00032","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00032","url":null,"abstract":"Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets due to their stringent data storage requirement. Online decision tree learning algorithms have been devised to tackle this problem by concurrently training with incoming samples and providing inference results. However, even the most up-to-date online tree learning algorithms still suffer from either high memory usage or high computational intensity with dependency and long latency, making them challenging to implement in hardware. To overcome these difficulties, we introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. The proposed algorithm is light-weight in terms of both memory and computational demand, while still maintaining high generalization ability. A series of optimization techniques dedicated to the proposed algorithm have been investigated from the hardware perspective, including coarse-grained and fine-grained parallelism, dynamic and memory-based resource sharing, pipelining with data forwarding. We further present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques. Experimental results show that our proposed algorithm outperforms the state-of-the-art Hoeffding tree learning method, leading to 0.05% to 12.3% improvement in inference accuracy. Real implementation of the complete learning system on the FPGA demonstrates a 384x to 1581x speedup in execution time over the state-of-the-art design.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130283946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

KPynq: A Work-Efficient Triangle-Inequality Based K-Means on FPGA 基于三角不等式的高效K-Means FPGA

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00061

Yuke Wang, Zhaorui Zeng, Boyuan Feng, Lei Deng, Yufei Ding

引用次数: 2

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00060

Cheng Fu, Shilin Zhu, Huili Chen, F. Koushanfar, Hao Su, Jishen Zhao

{"title":"SimBNN: A Similarity-Aware Binarized Neural Network Acceleration Framework","authors":"Cheng Fu, Shilin Zhu, Huili Chen, F. Koushanfar, Hao Su, Jishen Zhao","doi":"10.1109/FCCM.2019.00060","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00060","url":null,"abstract":"Binarized Neural Networks (BNNs) eliminate bitwidth redundancy in Convolutional Neural Networks (CNNs) by using a single bit (-1/+1) for network parameters and intermediate representations. This greatly reduces off-chip data transfer and storage overhead. However, considerable computation redundancy remains in BNN inference. To tackle this problem, we investigate the similarity property in input data and kernel weights. We identify an average of 79% input similarity and 61% kernel similarity measured by our proposed metric across common network architectures. Motivated by this observation, we propose SimBNN, a fast and energy-efficient acceleration framework for BNN inference that leverages similarity properties. SimBNN consists of a set of similarity-aware accelerators, a weight reuse optimization algorithm, and a similarity selection mechanism. SimBNN incorporates two types of BNN accelerators, which exploit the input similarity and kernel similarity, respectively. More specifically, the result from the previous stage is reused if similarity is identified, thus significantly reducing BNN computation overhead. Furthermore, we propose a weight reuse optimization algorithm, which increases the weight similarity by off-line re-ordering weight kernels. Finally, our framework provides a systematic method to determine the optimal strategy between input data and kernel weights reuse, based on the similarity characteristics of input data and pre-trained BNNs.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122991661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Welcome Message from the General and Program Chairs 总主委及计划主委致欢迎辞

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/fccm.2019.00005

引用次数: 0

Safe Task Interruption for FPGAs fpga的安全任务中断

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00070

Sameh Attia, Vaughn Betz

引用次数: 1

Impact of FPGA Architecture on Area and Performance of CGRA Overlays FPGA架构对CGRA覆盖面积和性能的影响

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00022

Ian Taras, J. Anderson

{"title":"Impact of FPGA Architecture on Area and Performance of CGRA Overlays","authors":"Ian Taras, J. Anderson","doi":"10.1109/FCCM.2019.00022","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00022","url":null,"abstract":"Coarse-grained reconfigurable arrays (CGRAs) are programmable logic devices with ALU-style processing elements and datapath interconnect. CGRAs can be realized as custom ASICs or implemented on FPGAs as overlays . A key element of CGRAs is that they are typically software programmable with rapid compile times – an advantage arising from their coarse-grained characteristics, simplifying CAD mapping tasks. We implement two previously published CGRAs as overlays on two commercial FPGAs (Intel and Xilinx), and consider the impact of the underlying FPGA architecture on the CGRA area and performance. We present optimizations for the overlays to take advantage of the FPGA architectural features and show a peak performance improvement of 1.93x, as well as maximum area savings of 31.1% and 48.5% for Intel and Xilinx, respectively, relative to a naive first-cut implementation. We also present a novel technique for a configurable multiplexer implementation, which embeds the select signals into SRAM configuration, saving 35.7% in area. The research is conducted using the open-source CGRA-ME (modeling and exploration) framework [1].","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Raparo: Resource-Level Angle-Based Parallel Routing for FPGAs 基于资源级角度的fpga并行路由

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00053

Minghua Shen, Nong Xiao

引用次数: 0

An FPGA-Based Computing Infrastructure Tailored to Efficiently Scaffold Genome Sequences 一种基于fpga的计算基础架构，可有效地支撑基因组序列

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00074

Alberto Zeni, M. Crespi, Lorenzo Di Tucci, M. Santambrogio

引用次数: 6

Maverick: A Stand-Alone CAD Flow for Partially Reconfigurable FPGA Modules Maverick:部分可重构FPGA模块的独立CAD流程

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00012

D. Glick, Jesse Grigg, B. Nelson, M. Wirthlin

引用次数: 2

High Precision, High Performance FPGA Adders 高精度，高性能FPGA加法器

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI: 10.1109/FCCM.2019.00047

M. Langhammer, B. Pasca, Gregg Baeckler

{"title":"High Precision, High Performance FPGA Adders","authors":"M. Langhammer, B. Pasca, Gregg Baeckler","doi":"10.1109/FCCM.2019.00047","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00047","url":null,"abstract":"FPGAs are now being commonly used in the datacenter as smart Network Interface Cards (NICs), with cryptography as one of the strategic application areas. Public key cryptography algorithms in particular require arithmetic with thousands of bits of precision. Even an operation as simple as addition can be difficult for the FPGA when dealing with large integers, because of the high resource count and high latency needed to achieve usable performance levels with known methods. This paper examines the architecture and implementation of high-performance integer adders on FPGAs for widths ranging from 1024 to 8192 bits, in both single-instance and many-core chip-filling configurations. For chip-filling designs the routing impact of these wide busses are assessed, as they often have an impact outside the immediate locality of the structures. The architectures presented in this work show 1 to 2 orders magnitude reduction in the area-latency product over commonly used approaches. Routing congestion is managed, with near 100% logic efficiency (packing) for the adder function. Performance for these largely automatically placed designs are approximately the same as for carefully floor-planned non-arithmetic applications. In one example design, we show a 2048 bit adder in 5021 ALMs, with a latency of 6 clock cycles, at 628 MHz in a Stratix 10 E-2 device.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130468049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5