Exploring Efficient Hardware Accelerator for Learning-Based Image Compression

IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Chen Chen;Haoyang Zhang;Kaicheng Guo;Xingzi Yu;Weidong Qiu;Zhengwei Qi;Haibing Guan
{"title":"Exploring Efficient Hardware Accelerator for Learning-Based Image Compression","authors":"Chen Chen;Haoyang Zhang;Kaicheng Guo;Xingzi Yu;Weidong Qiu;Zhengwei Qi;Haibing Guan","doi":"10.1109/TCAD.2024.3515856","DOIUrl":null,"url":null,"abstract":"Recently, learning-based image compression (LIC) methods have surpassed manually designed approaches in both compression quality and bitrate. However, increasing computational demands and insufficient optimizations in codec performance have hindered the advancement of LIC acceleration. Most researches focus on optimizing specific components, often neglecting the sources of underutilization during the execution of LIC models. Generally, efficient LIC acceleration encounters three primary challenges: 1) extra overheads introduced by individual optimizations; 2) load and computation imbalances in small kernels; and 3) mismatches between hardware configurations and the LIC models. To address these challenges, we propose a framework named extensive accelerator for LIC (X-LIC) for efficiently exploring the design space under constrained resources. First, we quantitatively characterize a representative LIC model, including its latency, computation size, and temporal utilization across various accelerators. We design a hardware-optimized quantization method to compensate for the lack of LIC-oriented research, particularly regarding data precision, distortion, and resource consumption. Additionally, we propose a parameterized LIC accelerator architecture that integrates seamlessly with existing loop optimization models and supports various LIC operators. Two optimization schemes are proposed for redundant computation in transposed convolution and load and computation imbalance in small kernels. Experimental results show that our framework demonstrates significant flexibility across a broad design space, achieving an average of 78%–95% of the theoretical peak performance and up to 688.2/759.1 GOP/s en/de-coder performance with INT8 precision. As a result, the en/de-coder performance can reach up to 33/36 FPS in 720P resolution. An FPGA demo of X-LIC is available at <uri>https://github.com/sjtu-tcloud/X-LIC</uri>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2204-2217"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10793077/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, learning-based image compression (LIC) methods have surpassed manually designed approaches in both compression quality and bitrate. However, increasing computational demands and insufficient optimizations in codec performance have hindered the advancement of LIC acceleration. Most researches focus on optimizing specific components, often neglecting the sources of underutilization during the execution of LIC models. Generally, efficient LIC acceleration encounters three primary challenges: 1) extra overheads introduced by individual optimizations; 2) load and computation imbalances in small kernels; and 3) mismatches between hardware configurations and the LIC models. To address these challenges, we propose a framework named extensive accelerator for LIC (X-LIC) for efficiently exploring the design space under constrained resources. First, we quantitatively characterize a representative LIC model, including its latency, computation size, and temporal utilization across various accelerators. We design a hardware-optimized quantization method to compensate for the lack of LIC-oriented research, particularly regarding data precision, distortion, and resource consumption. Additionally, we propose a parameterized LIC accelerator architecture that integrates seamlessly with existing loop optimization models and supports various LIC operators. Two optimization schemes are proposed for redundant computation in transposed convolution and load and computation imbalance in small kernels. Experimental results show that our framework demonstrates significant flexibility across a broad design space, achieving an average of 78%–95% of the theoretical peak performance and up to 688.2/759.1 GOP/s en/de-coder performance with INT8 precision. As a result, the en/de-coder performance can reach up to 33/36 FPS in 720P resolution. An FPGA demo of X-LIC is available at https://github.com/sjtu-tcloud/X-LIC.
探索基于学习的图像压缩的高效硬件加速器
近年来,基于学习的图像压缩(LIC)方法在压缩质量和比特率方面都超过了人工设计的方法。然而,不断增加的计算需求和编解码器性能优化不足阻碍了LIC加速的发展。大多数研究都集中在优化特定组件上,往往忽略了LIC模型执行过程中未充分利用的根源。通常,高效的LIC加速会遇到三个主要挑战:1)单个优化带来的额外开销;2)小内核的负载和计算不平衡;3)硬件配置与LIC型号不匹配。为了应对这些挑战,我们提出了一个名为LIC (X-LIC)的扩展加速器框架,以有效地探索有限资源下的设计空间。首先,我们定量地描述了一个代表性的LIC模型,包括它的延迟、计算大小和跨各种加速器的时间利用率。我们设计了一种硬件优化的量化方法来弥补面向lic研究的不足,特别是在数据精度,失真和资源消耗方面。此外,我们提出了一个参数化的LIC加速器架构,该架构与现有的循环优化模型无缝集成,并支持各种LIC操作符。针对转置卷积的冗余计算和小核的负载与计算不平衡问题,提出了两种优化方案。实验结果表明,我们的框架在广泛的设计空间中表现出显著的灵活性,平均达到理论峰值性能的78%-95%,达到688.2/759.1 GOP/s的编码/解码性能,精度为INT8。因此,在720P分辨率下,en/ decocoder性能可以达到33/36 FPS。X-LIC的FPGA演示可在https://github.com/sjtu-tcloud/X-LIC上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信