{"title":"基于重叠条纹推理的4.6-8.3 TOPS/W 1.2-4.9 TOPS cnn计算成像处理器,实现4K超高清30fps","authors":"Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching Wang, Chun-Yeh Lin, Yong-Tai Chen, Chao-Tsung Huang","doi":"10.1109/ESSCIRC55480.2022.9911515","DOIUrl":null,"url":null,"abstract":"In this paper, we present an energy-efficient acceler-ator chip which supports high-quality CNN-based computational imaging applications at 4K UItra-UD 30fps. To address the huge requirement of DRAM bandwidth and computing energy, an overlapped stripe inference flow and a structure-sparse $\\text{CONV}3\\mathrm{x}3$ engine are proposed respectively. The former reduces DRAM bandwidth to 0.81-1.74 GB/s when supporting high-quality CNN inference with 16 to 29 layers at 4K UItra-UD 30fps. The latter reduces computing complexity by 40% without noticeable quality degradation, e.g. 0.02-0.03 dB of PSNR drop. More specifically, it uses only 4.9 intrinsic TOPS of computing capability at 200 MHz to approach the quality of dense models which demand up to 8.2 TOPS. In addition, a coarse-grained reconfigurable datapath is designed to support diverse Applications including super-resolution, denoising, and style transfer with high hardware efficiency. Fabricated in 40nm CMOS, this chip achieves 4.6-8.3 TOP/W of energy efficiency for high-quality computational imaging applications. We also implement an FPGA-aided system to demonstrate real-time processing for the diverse applications supported by the fabricated chip.","PeriodicalId":168466,"journal":{"name":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A 4.6-8.3 TOPS/W 1.2-4.9 TOPS CNN-based Computational Imaging Processor with Overlapped Stripe Inference Achieving 4K Ultra-HD 30fps\",\"authors\":\"Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching Wang, Chun-Yeh Lin, Yong-Tai Chen, Chao-Tsung Huang\",\"doi\":\"10.1109/ESSCIRC55480.2022.9911515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present an energy-efficient acceler-ator chip which supports high-quality CNN-based computational imaging applications at 4K UItra-UD 30fps. To address the huge requirement of DRAM bandwidth and computing energy, an overlapped stripe inference flow and a structure-sparse $\\\\text{CONV}3\\\\mathrm{x}3$ engine are proposed respectively. The former reduces DRAM bandwidth to 0.81-1.74 GB/s when supporting high-quality CNN inference with 16 to 29 layers at 4K UItra-UD 30fps. The latter reduces computing complexity by 40% without noticeable quality degradation, e.g. 0.02-0.03 dB of PSNR drop. More specifically, it uses only 4.9 intrinsic TOPS of computing capability at 200 MHz to approach the quality of dense models which demand up to 8.2 TOPS. In addition, a coarse-grained reconfigurable datapath is designed to support diverse Applications including super-resolution, denoising, and style transfer with high hardware efficiency. Fabricated in 40nm CMOS, this chip achieves 4.6-8.3 TOP/W of energy efficiency for high-quality computational imaging applications. We also implement an FPGA-aided system to demonstrate real-time processing for the diverse applications supported by the fabricated chip.\",\"PeriodicalId\":168466,\"journal\":{\"name\":\"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESSCIRC55480.2022.9911515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC55480.2022.9911515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 4.6-8.3 TOPS/W 1.2-4.9 TOPS CNN-based Computational Imaging Processor with Overlapped Stripe Inference Achieving 4K Ultra-HD 30fps
In this paper, we present an energy-efficient acceler-ator chip which supports high-quality CNN-based computational imaging applications at 4K UItra-UD 30fps. To address the huge requirement of DRAM bandwidth and computing energy, an overlapped stripe inference flow and a structure-sparse $\text{CONV}3\mathrm{x}3$ engine are proposed respectively. The former reduces DRAM bandwidth to 0.81-1.74 GB/s when supporting high-quality CNN inference with 16 to 29 layers at 4K UItra-UD 30fps. The latter reduces computing complexity by 40% without noticeable quality degradation, e.g. 0.02-0.03 dB of PSNR drop. More specifically, it uses only 4.9 intrinsic TOPS of computing capability at 200 MHz to approach the quality of dense models which demand up to 8.2 TOPS. In addition, a coarse-grained reconfigurable datapath is designed to support diverse Applications including super-resolution, denoising, and style transfer with high hardware efficiency. Fabricated in 40nm CMOS, this chip achieves 4.6-8.3 TOP/W of energy efficiency for high-quality computational imaging applications. We also implement an FPGA-aided system to demonstrate real-time processing for the diverse applications supported by the fabricated chip.