An Optimized Architecture For Decomposed Convolutional Neural Networks

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2018-07-01 DOI:10.1109/ISVLSI.2018.00100

Fangxuan Sun, Jun Lin, Zhongfeng Wang

{"title":"An Optimized Architecture For Decomposed Convolutional Neural Networks","authors":"Fangxuan Sun, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI.2018.00100","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) have found extensive applications in various tasks. However, the state-of-the-art CNNs are both computation-intensive and memory-intensive, which brings tremendous hardware implementation challenges. Various methods have been proposed to reduce the model size and computation complexity of a CNN. Among them, when hardware implementation is considered, the Canonical Polyadic decomposition (CPD) method is more suitable due to the regularity in the decomposed filters. Moreover, the CPD method can be combined with widely used pruning methods to compress the model in further. In this paper, to the best of our knowledge, an efficient hardware architecture for CPD-CNNs is proposed for the first time based on a carefully designed data flow. In detail, a reconfigurable fast convolution unit is introduced to reduce the number of multiplications while handling some commonly-used convolution core operations. The proposed architecture is coded with RTL and synthesized under the TSMC 90nm CMOS technology. Our design achieves an equivalent throughput of more than 3TOP/s under 650MHz clock frequency.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2018.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) have found extensive applications in various tasks. However, the state-of-the-art CNNs are both computation-intensive and memory-intensive, which brings tremendous hardware implementation challenges. Various methods have been proposed to reduce the model size and computation complexity of a CNN. Among them, when hardware implementation is considered, the Canonical Polyadic decomposition (CPD) method is more suitable due to the regularity in the decomposed filters. Moreover, the CPD method can be combined with widely used pruning methods to compress the model in further. In this paper, to the best of our knowledge, an efficient hardware architecture for CPD-CNNs is proposed for the first time based on a carefully designed data flow. In detail, a reconfigurable fast convolution unit is introduced to reduce the number of multiplications while handling some commonly-used convolution core operations. The proposed architecture is coded with RTL and synthesized under the TSMC 90nm CMOS technology. Our design achieves an equivalent throughput of more than 3TOP/s under 650MHz clock frequency.

查看原文本刊更多论文

分解卷积神经网络的优化结构

卷积神经网络(cnn)在各种任务中得到了广泛的应用。然而，最先进的cnn是计算密集型和内存密集型的，这给硬件实现带来了巨大的挑战。人们提出了各种方法来减小CNN的模型尺寸和计算复杂度。其中，在考虑硬件实现的情况下，规范多元分解(Canonical Polyadic decomposition, CPD)方法由于分解后的滤波器具有一定的规律性而更为适用。此外，CPD方法可以与广泛使用的剪枝方法相结合，进一步压缩模型。在本文中，据我们所知，基于精心设计的数据流，首次提出了一种高效的cpd - cnn硬件架构。在处理一些常用的卷积核心操作的同时，引入了一个可重构的快速卷积单元来减少乘法次数。该架构采用RTL编码，并在台积电90nm CMOS技术下进行合成。我们的设计在650MHz时钟频率下实现了超过3TOP/s的等效吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量