Philip Colangelo, Randy Huang, Enno Lübbers, M. Margala, Kevin Nealis
{"title":"Fine-Grained Acceleration of Binary Neural Networks Using Intel® Xeon® Processor with Integrated FPGA","authors":"Philip Colangelo, Randy Huang, Enno Lübbers, M. Margala, Kevin Nealis","doi":"10.1109/FCCM.2017.46","DOIUrl":null,"url":null,"abstract":"Binary weighted networks (BWN) for imageclassification reduce computation for convolutional neuralnetworks (CNN) from multiply-adds to accumulates with littleto no accuracy loss. Hardware architectures such as FPGA cantake full advantage of BWN computations because of theirability to express weights represented as 0 and 1 efficientlythrough customizable logic. In this paper, we present animplementation on Intel®'s Xeon® processor with integratedFPGA to accelerate binary weighted networks. We interfaceIntel's Accelerator Abstraction Layer (AAL) with Caffe toprovide a robust framework used for accelerating CNN. Utilizing the low latency Quick Path Interconnect (QPI) between the Broadwell Xeon® processor and Arria10 FPGA, we can perform fine-grained offloads for specific portions ofthe network. Due to convolution layers making up most of thecomputation in our experiments, we offload the feature andweight data to customized binary hardware in the FPGA forfaster execution. An initial proof of concept design shows thatby using both the Xeon processor and FPGA together we canimprove the throughput by 2x on some layers and by 1.3xoverall while utilizing only a small percentage of FPGA core logic.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Binary weighted networks (BWN) for imageclassification reduce computation for convolutional neuralnetworks (CNN) from multiply-adds to accumulates with littleto no accuracy loss. Hardware architectures such as FPGA cantake full advantage of BWN computations because of theirability to express weights represented as 0 and 1 efficientlythrough customizable logic. In this paper, we present animplementation on Intel®'s Xeon® processor with integratedFPGA to accelerate binary weighted networks. We interfaceIntel's Accelerator Abstraction Layer (AAL) with Caffe toprovide a robust framework used for accelerating CNN. Utilizing the low latency Quick Path Interconnect (QPI) between the Broadwell Xeon® processor and Arria10 FPGA, we can perform fine-grained offloads for specific portions ofthe network. Due to convolution layers making up most of thecomputation in our experiments, we offload the feature andweight data to customized binary hardware in the FPGA forfaster execution. An initial proof of concept design shows thatby using both the Xeon processor and FPGA together we canimprove the throughput by 2x on some layers and by 1.3xoverall while utilizing only a small percentage of FPGA core logic.