SNNAP: Approximate computing on programmable SoCs via neural acceleration

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-03-09 DOI:10.1109/HPCA.2015.7056066

T. Moreau, Mark Wyse, J. Nelson, Adrian Sampson, H. Esmaeilzadeh, L. Ceze, M. Oskin

{"title":"SNNAP: Approximate computing on programmable SoCs via neural acceleration","authors":"T. Moreau, Mark Wyse, J. Nelson, Adrian Sampson, H. Esmaeilzadeh, L. Ceze, M. Oskin","doi":"10.1109/HPCA.2015.7056066","DOIUrl":null,"url":null,"abstract":"Many applications that can take advantage of accelerators are amenable to approximate execution. Past work has shown that neural acceleration is a viable way to accelerate approximate code. In light of the growing availability of on-chip field-programmable gate arrays (FPGAs), this paper explores neural acceleration on off-the-shelf programmable SoCs. We describe the design and implementation of SNNAP, a flexible FPGA-based neural accelerator for approximate programs. SNNAP is designed to work with a compiler workflow that configures the neural network's topology and weights instead of the programmable logic of the FPGA itself. This approach enables effective use of neural acceleration in commercially available devices and accelerates different applications without costly FPGA reconfigurations. No hardware expertise is required to accelerate software with SNNAP, so the effort required can be substantially lower than custom hardware design for an FPGA fabric and possibly even lower than current “C-to-gates” high-level synthesis (HLS) tools. Our measurements on a Xilinx Zynq FPGA show that SNNAP yields a geometric mean of 3.8× speedup (as high as 38.1×) and 2.8× energy savings (as high as 28 x) with less than 10% quality loss across all applications but one. We also compare SNNAP with designs generated by commercial HLS tools and show that SNNAP has similar performance overall, with better resource-normalized throughput on 4 out of 7 benchmarks.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"349 1","pages":"603-614"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"136","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 136

Abstract

Many applications that can take advantage of accelerators are amenable to approximate execution. Past work has shown that neural acceleration is a viable way to accelerate approximate code. In light of the growing availability of on-chip field-programmable gate arrays (FPGAs), this paper explores neural acceleration on off-the-shelf programmable SoCs. We describe the design and implementation of SNNAP, a flexible FPGA-based neural accelerator for approximate programs. SNNAP is designed to work with a compiler workflow that configures the neural network's topology and weights instead of the programmable logic of the FPGA itself. This approach enables effective use of neural acceleration in commercially available devices and accelerates different applications without costly FPGA reconfigurations. No hardware expertise is required to accelerate software with SNNAP, so the effort required can be substantially lower than custom hardware design for an FPGA fabric and possibly even lower than current “C-to-gates” high-level synthesis (HLS) tools. Our measurements on a Xilinx Zynq FPGA show that SNNAP yields a geometric mean of 3.8× speedup (as high as 38.1×) and 2.8× energy savings (as high as 28 x) with less than 10% quality loss across all applications but one. We also compare SNNAP with designs generated by commercial HLS tools and show that SNNAP has similar performance overall, with better resource-normalized throughput on 4 out of 7 benchmarks.

查看原文本刊更多论文

通过神经加速对可编程soc进行近似计算

许多可以利用加速器的应用程序都可以近似执行。过去的工作表明，神经加速是加速近似代码的可行方法。鉴于片上现场可编程门阵列(fpga)的可用性越来越高，本文探讨了现成可编程soc上的神经加速。我们描述了SNNAP的设计和实现，SNNAP是一种灵活的基于fpga的近似程序神经加速器。SNNAP被设计为与编译器工作流一起工作，该工作流配置神经网络的拓扑和权重，而不是FPGA本身的可编程逻辑。这种方法可以在商用设备中有效地使用神经加速，并在不需要昂贵的FPGA重新配置的情况下加速不同的应用。使用SNNAP加速软件不需要硬件专业知识，因此所需的工作量大大低于FPGA结构的定制硬件设计，甚至可能低于当前的“C-to-gates”高级综合(HLS)工具。我们在Xilinx Zynq FPGA上的测量表明，SNNAP在除一个应用之外的所有应用中产生3.8倍的几何平均加速(高达38.1倍)和2.8倍的节能(高达28倍)，质量损失小于10%。我们还将SNNAP与商业HLS工具生成的设计进行了比较，结果表明SNNAP具有相似的总体性能，在7个基准测试中的4个基准测试中具有更好的资源标准化吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量