Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI:10.1109/HPEC.2017.8091025

Philip Colangelo, Enno Lübbers, Randy Huang, M. Margala, Kevin Nealis

{"title":"Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA","authors":"Philip Colangelo, Enno Lübbers, Randy Huang, M. Margala, Kevin Nealis","doi":"10.1109/HPEC.2017.8091025","DOIUrl":null,"url":null,"abstract":"Intel®'s Xeon® processor with integrated FPGA is a new research platform that provides all the capabilities of a Broadwell Xeon Processor with the added functionality of an Arria 10 FPGA in the same package. In this paper, we present an implementation on this platform to showcase the abilities and effectiveness of utilizing both hardware architectures to accelerate a convolutional based neural network (CNN). We choose a network topology that uses binary weights and low precision activation data to take advantage of the available customizable fabric provided by the FPGA. Further, compared to standard multiply accumulate CNN's, binary weighted networks (BWN) reduce the amount of computation by eliminating the need for multiplication resulting in little to no classification accuracy degradation. Coupling Intel's Open Programmable Acceleration Engine (OPAE) with Caffe provides a robust framework that was used as the foundation for our application. Due to the convolution primitives taking the most computation in our network, we offload the feature and weight data to a customized binary convolution accelerator loaded in the FPGA. Employing the low latency Quick Path Interconnect (QPI) that bridges the Broadwell Xeon processor and Arria 10 FPGA, we can carry out fine-grained offloads while avoiding bandwidth bottlenecks. An initial proof of concept design showcasing this new platform that utilizes only a portion of the FPGA core logic exemplifies that by using both the Xeon processor and FPGA together we can improve the throughput by 2× on some layers and by 1.3× overall.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Intel®'s Xeon® processor with integrated FPGA is a new research platform that provides all the capabilities of a Broadwell Xeon Processor with the added functionality of an Arria 10 FPGA in the same package. In this paper, we present an implementation on this platform to showcase the abilities and effectiveness of utilizing both hardware architectures to accelerate a convolutional based neural network (CNN). We choose a network topology that uses binary weights and low precision activation data to take advantage of the available customizable fabric provided by the FPGA. Further, compared to standard multiply accumulate CNN's, binary weighted networks (BWN) reduce the amount of computation by eliminating the need for multiplication resulting in little to no classification accuracy degradation. Coupling Intel's Open Programmable Acceleration Engine (OPAE) with Caffe provides a robust framework that was used as the foundation for our application. Due to the convolution primitives taking the most computation in our network, we offload the feature and weight data to a customized binary convolution accelerator loaded in the FPGA. Employing the low latency Quick Path Interconnect (QPI) that bridges the Broadwell Xeon processor and Arria 10 FPGA, we can carry out fine-grained offloads while avoiding bandwidth bottlenecks. An initial proof of concept design showcasing this new platform that utilizes only a portion of the FPGA core logic exemplifies that by using both the Xeon processor and FPGA together we can improve the throughput by 2× on some layers and by 1.3× overall.

查看原文本刊更多论文

卷积神经网络在Intel®Xeon®集成FPGA处理器上的应用

Intel®的Xeon®集成FPGA处理器是一个新的研究平台，在相同的封装中提供Broadwell Xeon处理器的所有功能和Arria 10 FPGA的附加功能。在本文中，我们提出了该平台上的实现，以展示利用两种硬件架构来加速基于卷积的神经网络(CNN)的能力和有效性。我们选择使用二进制权重和低精度激活数据的网络拓扑，以利用FPGA提供的可用定制结构。此外，与标准乘法累积CNN相比，二元加权网络(BWN)通过消除乘法的需要减少了计算量，导致分类精度几乎没有下降。将英特尔的开放可编程加速引擎(OPAE)与Caffe相结合，提供了一个健壮的框架，作为我们应用程序的基础。由于卷积原语在我们的网络中占用了最多的计算，我们将特征和权重数据卸载到FPGA中加载的定制二进制卷积加速器中。采用桥接Broadwell至强处理器和Arria 10 FPGA的低延迟快速路径互连(QPI)，我们可以在避免带宽瓶颈的同时进行细粒度的卸载。最初的概念验证设计展示了这个仅利用部分FPGA核心逻辑的新平台，通过同时使用至强处理器和FPGA，我们可以将某些层的吞吐量提高2倍，整体吞吐量提高1.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量