基于重叠条纹推理的4.6-8.3 TOPS/W 1.2-4.9 TOPS cnn计算成像处理器,实现4K超高清30fps

Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching Wang, Chun-Yeh Lin, Yong-Tai Chen, Chao-Tsung Huang
{"title":"基于重叠条纹推理的4.6-8.3 TOPS/W 1.2-4.9 TOPS cnn计算成像处理器,实现4K超高清30fps","authors":"Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching Wang, Chun-Yeh Lin, Yong-Tai Chen, Chao-Tsung Huang","doi":"10.1109/ESSCIRC55480.2022.9911515","DOIUrl":null,"url":null,"abstract":"In this paper, we present an energy-efficient acceler-ator chip which supports high-quality CNN-based computational imaging applications at 4K UItra-UD 30fps. To address the huge requirement of DRAM bandwidth and computing energy, an overlapped stripe inference flow and a structure-sparse $\\text{CONV}3\\mathrm{x}3$ engine are proposed respectively. The former reduces DRAM bandwidth to 0.81-1.74 GB/s when supporting high-quality CNN inference with 16 to 29 layers at 4K UItra-UD 30fps. The latter reduces computing complexity by 40% without noticeable quality degradation, e.g. 0.02-0.03 dB of PSNR drop. More specifically, it uses only 4.9 intrinsic TOPS of computing capability at 200 MHz to approach the quality of dense models which demand up to 8.2 TOPS. In addition, a coarse-grained reconfigurable datapath is designed to support diverse Applications including super-resolution, denoising, and style transfer with high hardware efficiency. Fabricated in 40nm CMOS, this chip achieves 4.6-8.3 TOP/W of energy efficiency for high-quality computational imaging applications. We also implement an FPGA-aided system to demonstrate real-time processing for the diverse applications supported by the fabricated chip.","PeriodicalId":168466,"journal":{"name":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A 4.6-8.3 TOPS/W 1.2-4.9 TOPS CNN-based Computational Imaging Processor with Overlapped Stripe Inference Achieving 4K Ultra-HD 30fps\",\"authors\":\"Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching Wang, Chun-Yeh Lin, Yong-Tai Chen, Chao-Tsung Huang\",\"doi\":\"10.1109/ESSCIRC55480.2022.9911515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present an energy-efficient acceler-ator chip which supports high-quality CNN-based computational imaging applications at 4K UItra-UD 30fps. To address the huge requirement of DRAM bandwidth and computing energy, an overlapped stripe inference flow and a structure-sparse $\\\\text{CONV}3\\\\mathrm{x}3$ engine are proposed respectively. The former reduces DRAM bandwidth to 0.81-1.74 GB/s when supporting high-quality CNN inference with 16 to 29 layers at 4K UItra-UD 30fps. The latter reduces computing complexity by 40% without noticeable quality degradation, e.g. 0.02-0.03 dB of PSNR drop. More specifically, it uses only 4.9 intrinsic TOPS of computing capability at 200 MHz to approach the quality of dense models which demand up to 8.2 TOPS. In addition, a coarse-grained reconfigurable datapath is designed to support diverse Applications including super-resolution, denoising, and style transfer with high hardware efficiency. Fabricated in 40nm CMOS, this chip achieves 4.6-8.3 TOP/W of energy efficiency for high-quality computational imaging applications. We also implement an FPGA-aided system to demonstrate real-time processing for the diverse applications supported by the fabricated chip.\",\"PeriodicalId\":168466,\"journal\":{\"name\":\"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESSCIRC55480.2022.9911515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC55480.2022.9911515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在本文中,我们提出了一种节能加速器芯片,它支持4K UItra-UD 30fps的高质量cnn计算成像应用。针对DRAM对带宽和计算能量的巨大需求,分别提出了重叠条纹推理流和结构稀疏$\text{CONV}3\ maththrm {x}3$引擎。前者在支持16至29层4K UItra-UD 30fps的高质量CNN推理时,将DRAM带宽降低至0.81-1.74 GB/s。后者将计算复杂性降低了40%,而没有明显的质量下降,例如PSNR下降0.02-0.03 dB。更具体地说,它在200 MHz时仅使用4.9个固有TOPS的计算能力来接近需要高达8.2个TOPS的密集模型的质量。此外,设计了粗粒度可重构数据路径,以支持各种应用,包括超分辨率,去噪和高硬件效率的风格转换。该芯片采用40nm CMOS制造,可实现4.6-8.3 TOP/W的能量效率,用于高质量的计算成像应用。我们还实现了一个fpga辅助系统,以演示制造芯片支持的各种应用的实时处理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A 4.6-8.3 TOPS/W 1.2-4.9 TOPS CNN-based Computational Imaging Processor with Overlapped Stripe Inference Achieving 4K Ultra-HD 30fps
In this paper, we present an energy-efficient acceler-ator chip which supports high-quality CNN-based computational imaging applications at 4K UItra-UD 30fps. To address the huge requirement of DRAM bandwidth and computing energy, an overlapped stripe inference flow and a structure-sparse $\text{CONV}3\mathrm{x}3$ engine are proposed respectively. The former reduces DRAM bandwidth to 0.81-1.74 GB/s when supporting high-quality CNN inference with 16 to 29 layers at 4K UItra-UD 30fps. The latter reduces computing complexity by 40% without noticeable quality degradation, e.g. 0.02-0.03 dB of PSNR drop. More specifically, it uses only 4.9 intrinsic TOPS of computing capability at 200 MHz to approach the quality of dense models which demand up to 8.2 TOPS. In addition, a coarse-grained reconfigurable datapath is designed to support diverse Applications including super-resolution, denoising, and style transfer with high hardware efficiency. Fabricated in 40nm CMOS, this chip achieves 4.6-8.3 TOP/W of energy efficiency for high-quality computational imaging applications. We also implement an FPGA-aided system to demonstrate real-time processing for the diverse applications supported by the fabricated chip.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信