High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression

ACM Transactions on Reconfigurable Technology and Systems (TRETS) Pub Date : 2018-12-12 DOI:10.1145/3270764

Adrien Prost-Boucle, A. Bourge, F. Pétrot

{"title":"High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression","authors":"Adrien Prost-Boucle, A. Bourge, F. Pétrot","doi":"10.1145/3270764","DOIUrl":null,"url":null,"abstract":"Although performing inference with artificial neural networks (ANN) was until quite recently considered as essentially compute intensive, the emergence of deep neural networks coupled with the evolution of the integration technology transformed inference into a memory bound problem. This ascertainment being established, many works have lately focused on minimizing memory accesses, either by enforcing and exploiting sparsity on weights or by using few bits for representing activations and weights, to be able to use ANNs inference in embedded devices. In this work, we detail an architecture dedicated to inference using ternary {−1, 0, 1} weights and activations. This architecture is configurable at design time to provide throughput vs. power trade-offs to choose from. It is also generic in the sense that it uses information drawn for the target technologies (memory geometries and cost, number of available cuts, etc.) to adapt at best to the FPGA resources. This allows to achieve up to 5.2k frames per second per Watt for classification on a VC709 board using approximately half of the resources of the FPGA.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3270764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

Although performing inference with artificial neural networks (ANN) was until quite recently considered as essentially compute intensive, the emergence of deep neural networks coupled with the evolution of the integration technology transformed inference into a memory bound problem. This ascertainment being established, many works have lately focused on minimizing memory accesses, either by enforcing and exploiting sparsity on weights or by using few bits for representing activations and weights, to be able to use ANNs inference in embedded devices. In this work, we detail an architecture dedicated to inference using ternary {−1, 0, 1} weights and activations. This architecture is configurable at design time to provide throughput vs. power trade-offs to choose from. It is also generic in the sense that it uses information drawn for the target technologies (memory geometries and cost, number of available cuts, etc.) to adapt at best to the FPGA resources. This allows to achieve up to 5.2k frames per second per Watt for classification on a VC709 board using approximately half of the resources of the FPGA.

查看原文本刊更多论文

自定义加法树和权值压缩的高效卷积三元神经网络

尽管使用人工神经网络(ANN)进行推理直到最近才被认为是本质上的计算密集型，但深度神经网络的出现加上集成技术的发展将推理转变为内存约束问题。这种确定已经建立，最近的许多工作都集中在最小化内存访问上，要么通过强制和利用权值的稀疏性，要么通过使用少量比特来表示激活和权值，以便能够在嵌入式设备中使用人工神经网络推理。在这项工作中，我们详细介绍了一个使用三元{−1,0,1}权重和激活来进行推理的架构。该体系结构在设计时是可配置的，以提供吞吐量与功率之间的权衡，供您选择。从某种意义上说，它也是通用的，它使用为目标技术绘制的信息(内存几何形状和成本、可用切割的数量等)来适应FPGA资源。这允许使用大约一半的FPGA资源在VC709板上实现每秒每瓦5.2帧的分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

自引率

0.00%

发文量