Sumin Lee, K. Lee, Sunghwan Joo, Hong Keun Ahn, Junghyup Lee, Dohyung Kim, Bumsub Ham, Seong-ook Jung
{"title":"SIF-NPU:一个28纳米3.48 TOPS/W 0.25 TOPS/mm2 CNN加速器,具有空间独立融合,用于实时超高清超分辨率","authors":"Sumin Lee, K. Lee, Sunghwan Joo, Hong Keun Ahn, Junghyup Lee, Dohyung Kim, Bumsub Ham, Seong-ook Jung","doi":"10.1109/ESSCIRC55480.2022.9911509","DOIUrl":null,"url":null,"abstract":"This paper proposes a convolutional neural network (CNN)-based super-resolution accelerator for up-scaling to ultra-HD (UHD) resolution in real-time in edge devices. A novel error-compensated bit quantization is adopted to reduce bit depth in the SR task. Spatially independent layer fusion is exploited to satisfy high throughput requirements at UHD resolution by increasing parallelism. Burst operation with write mask in the dual-port SRAM increases the process element utilization by allowing the concurrent multi-access without exploiting additional memory. The accelerator is implemented in the 28nm technology and shows at least 4.3 times higher $\\text{FoM}(\\text{TOPS}/\\text{mm}^{2}\\times \\text{TOPS/W)}$ of 0.87 than the state-of-art CNN accelerators. The implemented accelerator supports up-scaling up to 96 frames-per-seconds in UHD resolution.","PeriodicalId":168466,"journal":{"name":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SIF-NPU: A 28nm 3.48 TOPS/W 0.25 TOPS/mm2 CNN Accelerator with Spatially Independent Fusion for Real-Time UHD Super-Resolution\",\"authors\":\"Sumin Lee, K. Lee, Sunghwan Joo, Hong Keun Ahn, Junghyup Lee, Dohyung Kim, Bumsub Ham, Seong-ook Jung\",\"doi\":\"10.1109/ESSCIRC55480.2022.9911509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a convolutional neural network (CNN)-based super-resolution accelerator for up-scaling to ultra-HD (UHD) resolution in real-time in edge devices. A novel error-compensated bit quantization is adopted to reduce bit depth in the SR task. Spatially independent layer fusion is exploited to satisfy high throughput requirements at UHD resolution by increasing parallelism. Burst operation with write mask in the dual-port SRAM increases the process element utilization by allowing the concurrent multi-access without exploiting additional memory. The accelerator is implemented in the 28nm technology and shows at least 4.3 times higher $\\\\text{FoM}(\\\\text{TOPS}/\\\\text{mm}^{2}\\\\times \\\\text{TOPS/W)}$ of 0.87 than the state-of-art CNN accelerators. The implemented accelerator supports up-scaling up to 96 frames-per-seconds in UHD resolution.\",\"PeriodicalId\":168466,\"journal\":{\"name\":\"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESSCIRC55480.2022.9911509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC55480.2022.9911509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SIF-NPU: A 28nm 3.48 TOPS/W 0.25 TOPS/mm2 CNN Accelerator with Spatially Independent Fusion for Real-Time UHD Super-Resolution
This paper proposes a convolutional neural network (CNN)-based super-resolution accelerator for up-scaling to ultra-HD (UHD) resolution in real-time in edge devices. A novel error-compensated bit quantization is adopted to reduce bit depth in the SR task. Spatially independent layer fusion is exploited to satisfy high throughput requirements at UHD resolution by increasing parallelism. Burst operation with write mask in the dual-port SRAM increases the process element utilization by allowing the concurrent multi-access without exploiting additional memory. The accelerator is implemented in the 28nm technology and shows at least 4.3 times higher $\text{FoM}(\text{TOPS}/\text{mm}^{2}\times \text{TOPS/W)}$ of 0.87 than the state-of-art CNN accelerators. The implemented accelerator supports up-scaling up to 96 frames-per-seconds in UHD resolution.