{"title":"An Optimized Architecture For Decomposed Convolutional Neural Networks","authors":"Fangxuan Sun, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI.2018.00100","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) have found extensive applications in various tasks. However, the state-of-the-art CNNs are both computation-intensive and memory-intensive, which brings tremendous hardware implementation challenges. Various methods have been proposed to reduce the model size and computation complexity of a CNN. Among them, when hardware implementation is considered, the Canonical Polyadic decomposition (CPD) method is more suitable due to the regularity in the decomposed filters. Moreover, the CPD method can be combined with widely used pruning methods to compress the model in further. In this paper, to the best of our knowledge, an efficient hardware architecture for CPD-CNNs is proposed for the first time based on a carefully designed data flow. In detail, a reconfigurable fast convolution unit is introduced to reduce the number of multiplications while handling some commonly-used convolution core operations. The proposed architecture is coded with RTL and synthesized under the TSMC 90nm CMOS technology. Our design achieves an equivalent throughput of more than 3TOP/s under 650MHz clock frequency.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2018.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural networks (CNNs) have found extensive applications in various tasks. However, the state-of-the-art CNNs are both computation-intensive and memory-intensive, which brings tremendous hardware implementation challenges. Various methods have been proposed to reduce the model size and computation complexity of a CNN. Among them, when hardware implementation is considered, the Canonical Polyadic decomposition (CPD) method is more suitable due to the regularity in the decomposed filters. Moreover, the CPD method can be combined with widely used pruning methods to compress the model in further. In this paper, to the best of our knowledge, an efficient hardware architecture for CPD-CNNs is proposed for the first time based on a carefully designed data flow. In detail, a reconfigurable fast convolution unit is introduced to reduce the number of multiplications while handling some commonly-used convolution core operations. The proposed architecture is coded with RTL and synthesized under the TSMC 90nm CMOS technology. Our design achieves an equivalent throughput of more than 3TOP/s under 650MHz clock frequency.