矢量量化卷积神经网络加速器设计

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-18 DOI:10.1109/AICAS.2019.8771469

Yi-Heng Wu, Heng Lee, Yu Sheng Lin, Shao-Yi Chien

{"title":"矢量量化卷积神经网络加速器设计","authors":"Yi-Heng Wu, Heng Lee, Yu Sheng Lin, Shao-Yi Chien","doi":"10.1109/AICAS.2019.8771469","DOIUrl":null,"url":null,"abstract":"In recent years, deep convolutional neural networks (CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional (CONV) and fully connected (FC) layers. In this paper, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to previous accelerators for CNN, the proposed architecture achieves 1.2–5x less DRAM access and 1.5–5x higher throughput for both CONV and FC layers.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Accelerator Design for Vector Quantized Convolutional Neural Network\",\"authors\":\"Yi-Heng Wu, Heng Lee, Yu Sheng Lin, Shao-Yi Chien\",\"doi\":\"10.1109/AICAS.2019.8771469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, deep convolutional neural networks (CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional (CONV) and fully connected (FC) layers. In this paper, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to previous accelerators for CNN, the proposed architecture achieves 1.2–5x less DRAM access and 1.5–5x higher throughput for both CONV and FC layers.\",\"PeriodicalId\":273095,\"journal\":{\"name\":\"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICAS.2019.8771469\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS.2019.8771469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，深度卷积神经网络(cnn)在许多计算机视觉研究领域取得了突破性的成功。由于cnn的模型尺寸大，计算量大，无法在手机等小型设备上高效执行。虽然已经开发了几种硬件加速器架构，但大多数都只能有效地处理CNN的两个主要层之一，即卷积(CONV)层和全连接(FC)层。在本文中，基于算法-架构-协同探索，我们的架构以高效地执行这两层为目标。首先选择矢量量化技术压缩参数，减少计算量，统一CONV层和FC层的行为。为了充分利用矢量量化的增益，我们提出了一种量化CNN的加速器架构。采用不同的DRAM存取方案来减少DRAM存取。我们还设计了一个高吞吐量的处理单元架构来加速量化层。与之前的CNN加速器相比，该架构在CONV和FC层上实现了1.2 - 5倍的DRAM访问和1.5 - 5倍的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerator Design for Vector Quantized Convolutional Neural Network

In recent years, deep convolutional neural networks (CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional (CONV) and fully connected (FC) layers. In this paper, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to previous accelerators for CNN, the proposed architecture achieves 1.2–5x less DRAM access and 1.5–5x higher throughput for both CONV and FC layers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量