{"title":"Accelerator Design for Vector Quantized Convolutional Neural Network","authors":"Yi-Heng Wu, Heng Lee, Yu Sheng Lin, Shao-Yi Chien","doi":"10.1109/AICAS.2019.8771469","DOIUrl":null,"url":null,"abstract":"In recent years, deep convolutional neural networks (CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional (CONV) and fully connected (FC) layers. In this paper, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to previous accelerators for CNN, the proposed architecture achieves 1.2–5x less DRAM access and 1.5–5x higher throughput for both CONV and FC layers.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS.2019.8771469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, deep convolutional neural networks (CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional (CONV) and fully connected (FC) layers. In this paper, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to previous accelerators for CNN, the proposed architecture achieves 1.2–5x less DRAM access and 1.5–5x higher throughput for both CONV and FC layers.