{"title":"Flexible Multi-Precision Accelerator Design for Deep Convolutional Neural Networks Considering Both Data Computation and Communication","authors":"Shen-Fu Hsiao, Yu-Hong Chen","doi":"10.1109/VLSI-DAT49148.2020.9196465","DOIUrl":null,"url":null,"abstract":"Due to the quick advance in deep convolutional neural networks (CNN), hardware acceleration of convolution computations for edge devices is crucial for many artificial intelligence applications. This paper presents a CNN accelerator design that supports various CNN filter sizes/strides and different bit precisions. In particular, we analyze the latency of data communication and computation and determine the proper precision that maximizes the utilization efficiency of available hardware resource. The proposed design supports data precision of 8-bit and 16-bit, and weight precision of 2-bit, 4-bit, 8-bit, and 16-bit for popular CNN models. It can effectively increase speed performance of low-precision computation by exploiting the additional parallelism.","PeriodicalId":235460,"journal":{"name":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSI-DAT49148.2020.9196465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Due to the quick advance in deep convolutional neural networks (CNN), hardware acceleration of convolution computations for edge devices is crucial for many artificial intelligence applications. This paper presents a CNN accelerator design that supports various CNN filter sizes/strides and different bit precisions. In particular, we analyze the latency of data communication and computation and determine the proper precision that maximizes the utilization efficiency of available hardware resource. The proposed design supports data precision of 8-bit and 16-bit, and weight precision of 2-bit, 4-bit, 8-bit, and 16-bit for popular CNN models. It can effectively increase speed performance of low-precision computation by exploiting the additional parallelism.