Fine-Grained Acceleration of Binary Neural Networks Using Intel® Xeon® Processor with Integrated FPGA

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI:10.1109/FCCM.2017.46

Philip Colangelo, Randy Huang, Enno Lübbers, M. Margala, Kevin Nealis

{"title":"Fine-Grained Acceleration of Binary Neural Networks Using Intel® Xeon® Processor with Integrated FPGA","authors":"Philip Colangelo, Randy Huang, Enno Lübbers, M. Margala, Kevin Nealis","doi":"10.1109/FCCM.2017.46","DOIUrl":null,"url":null,"abstract":"Binary weighted networks (BWN) for imageclassification reduce computation for convolutional neuralnetworks (CNN) from multiply-adds to accumulates with littleto no accuracy loss. Hardware architectures such as FPGA cantake full advantage of BWN computations because of theirability to express weights represented as 0 and 1 efficientlythrough customizable logic. In this paper, we present animplementation on Intel®'s Xeon® processor with integratedFPGA to accelerate binary weighted networks. We interfaceIntel's Accelerator Abstraction Layer (AAL) with Caffe toprovide a robust framework used for accelerating CNN. Utilizing the low latency Quick Path Interconnect (QPI) between the Broadwell Xeon® processor and Arria10 FPGA, we can perform fine-grained offloads for specific portions ofthe network. Due to convolution layers making up most of thecomputation in our experiments, we offload the feature andweight data to customized binary hardware in the FPGA forfaster execution. An initial proof of concept design shows thatby using both the Xeon processor and FPGA together we canimprove the throughput by 2x on some layers and by 1.3xoverall while utilizing only a small percentage of FPGA core logic.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Binary weighted networks (BWN) for imageclassification reduce computation for convolutional neuralnetworks (CNN) from multiply-adds to accumulates with littleto no accuracy loss. Hardware architectures such as FPGA cantake full advantage of BWN computations because of theirability to express weights represented as 0 and 1 efficientlythrough customizable logic. In this paper, we present animplementation on Intel®'s Xeon® processor with integratedFPGA to accelerate binary weighted networks. We interfaceIntel's Accelerator Abstraction Layer (AAL) with Caffe toprovide a robust framework used for accelerating CNN. Utilizing the low latency Quick Path Interconnect (QPI) between the Broadwell Xeon® processor and Arria10 FPGA, we can perform fine-grained offloads for specific portions ofthe network. Due to convolution layers making up most of thecomputation in our experiments, we offload the feature andweight data to customized binary hardware in the FPGA forfaster execution. An initial proof of concept design shows thatby using both the Xeon processor and FPGA together we canimprove the throughput by 2x on some layers and by 1.3xoverall while utilizing only a small percentage of FPGA core logic.

查看原文本刊更多论文

基于Intel®Xeon®处理器和集成FPGA的二进制神经网络的细粒度加速

用于图像分类的二值加权网络(BWN)将卷积神经网络(CNN)的计算量从乘相加减少到累加，并且精度几乎没有损失。FPGA等硬件架构可以充分利用BWN计算，因为它们能够通过可定制的逻辑有效地表示以0和1表示的权重。在本文中，我们提出了一个在Intel®Xeon®处理器上集成fpga来加速二元加权网络的实现。我们将英特尔的加速器抽象层(AAL)与Caffe接口，以提供用于加速CNN的健壮框架。利用Broadwell Xeon®处理器和Arria10 FPGA之间的低延迟快速路径互连(QPI)，我们可以对网络的特定部分执行细粒度的卸载。由于卷积层构成了我们实验中的大部分计算，我们将特征和权重数据卸载到FPGA中的定制二进制硬件中以加快执行速度。最初的概念验证设计表明，通过同时使用至强处理器和FPGA，我们可以在只使用一小部分FPGA核心逻辑的情况下，将某些层的吞吐量提高2倍，整体吞吐量提高1.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量