考虑数据计算和通信的深度卷积神经网络柔性多精度加速器设计

2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) Pub Date : 2020-08-01 DOI:10.1109/VLSI-DAT49148.2020.9196465

Shen-Fu Hsiao, Yu-Hong Chen

{"title":"考虑数据计算和通信的深度卷积神经网络柔性多精度加速器设计","authors":"Shen-Fu Hsiao, Yu-Hong Chen","doi":"10.1109/VLSI-DAT49148.2020.9196465","DOIUrl":null,"url":null,"abstract":"Due to the quick advance in deep convolutional neural networks (CNN), hardware acceleration of convolution computations for edge devices is crucial for many artificial intelligence applications. This paper presents a CNN accelerator design that supports various CNN filter sizes/strides and different bit precisions. In particular, we analyze the latency of data communication and computation and determine the proper precision that maximizes the utilization efficiency of available hardware resource. The proposed design supports data precision of 8-bit and 16-bit, and weight precision of 2-bit, 4-bit, 8-bit, and 16-bit for popular CNN models. It can effectively increase speed performance of low-precision computation by exploiting the additional parallelism.","PeriodicalId":235460,"journal":{"name":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Flexible Multi-Precision Accelerator Design for Deep Convolutional Neural Networks Considering Both Data Computation and Communication\",\"authors\":\"Shen-Fu Hsiao, Yu-Hong Chen\",\"doi\":\"10.1109/VLSI-DAT49148.2020.9196465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the quick advance in deep convolutional neural networks (CNN), hardware acceleration of convolution computations for edge devices is crucial for many artificial intelligence applications. This paper presents a CNN accelerator design that supports various CNN filter sizes/strides and different bit precisions. In particular, we analyze the latency of data communication and computation and determine the proper precision that maximizes the utilization efficiency of available hardware resource. The proposed design supports data precision of 8-bit and 16-bit, and weight precision of 2-bit, 4-bit, 8-bit, and 16-bit for popular CNN models. It can effectively increase speed performance of low-precision computation by exploiting the additional parallelism.\",\"PeriodicalId\":235460,\"journal\":{\"name\":\"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VLSI-DAT49148.2020.9196465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSI-DAT49148.2020.9196465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

由于深度卷积神经网络(CNN)的快速发展，边缘设备卷积计算的硬件加速对于许多人工智能应用至关重要。本文提出了一种支持各种CNN滤波器尺寸/步长和不同位精度的CNN加速器设计。特别地，我们分析了数据通信和计算的延迟，并确定了适当的精度，以最大限度地提高可用硬件资源的利用率。本设计支持常用CNN模型的8位和16位数据精度，以及2位、4位、8位和16位权重精度。利用额外的并行性，可以有效地提高低精度计算的速度性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Flexible Multi-Precision Accelerator Design for Deep Convolutional Neural Networks Considering Both Data Computation and Communication

Due to the quick advance in deep convolutional neural networks (CNN), hardware acceleration of convolution computations for edge devices is crucial for many artificial intelligence applications. This paper presents a CNN accelerator design that supports various CNN filter sizes/strides and different bit precisions. In particular, we analyze the latency of data communication and computation and determine the proper precision that maximizes the utilization efficiency of available hardware resource. The proposed design supports data precision of 8-bit and 16-bit, and weight precision of 2-bit, 4-bit, 8-bit, and 16-bit for popular CNN models. It can effectively increase speed performance of low-precision computation by exploiting the additional parallelism.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)

自引率

0.00%

发文量