{"title":"一种具有快速核选择和高效变换电路的新型变换加速器","authors":"Zhijian Hao;Chenlong He;Jiaming Liu;Qi Zheng;Jinchang Xu;Peijun Ma;Xiaohua Ma;Yue Hao","doi":"10.1109/TCSI.2025.3543575","DOIUrl":null,"url":null,"abstract":"The introduction of multiple transform types into the Versatile Video Coding (VVC) standard has yielded notable encoding gains but also resulted in substantial computational burdens, posing two critical challenges for hardware implementation: fast kernel selection and efficient transform computation design. Existing studies typically address these challenges in isolation, lacking a holistic solution for VVC transform coding. In this paper, we presents a groundbreaking transform accelerator that unifies transform kernel selection and multiple transform circuit within a single framework. In terms of algorithms, driven by mechanistic analysis, we propose a decision tree-based kernel selection algorithm that ensures both high decision accuracy and computational efficiency. Additionally, we design a transfer matrix-based approximation algorithm for Discrete Sine Transform Type-7 and a matrix decomposition-based improved computation for Discrete Cosine Transform Type-2, significantly reducing the computational complexity. On the hardware front, we implement a high-precision and area-efficient transform accelerator, which integrates highly pipelined kernel selection and transform computation architectures. With multiple reuse and parallelism strategies, the accelerator demonstrates substantial resource efficiency advantages. Experimental results reveal that the proposed accelerator achieves a circuit resource reduction of over 44% with a slight performance degradation, while maintaining processing capabilities up to 8K@57 fps. To the best of our knowledge, this is the first comprehensive hardware solution for VVC transform coding that jointly addresses the challenges of kernel selection and transform circuit design.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 6","pages":"2726-2739"},"PeriodicalIF":5.2000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Transform Accelerator With Fast Kernel Selection and Efficient Transform Circuit\",\"authors\":\"Zhijian Hao;Chenlong He;Jiaming Liu;Qi Zheng;Jinchang Xu;Peijun Ma;Xiaohua Ma;Yue Hao\",\"doi\":\"10.1109/TCSI.2025.3543575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The introduction of multiple transform types into the Versatile Video Coding (VVC) standard has yielded notable encoding gains but also resulted in substantial computational burdens, posing two critical challenges for hardware implementation: fast kernel selection and efficient transform computation design. Existing studies typically address these challenges in isolation, lacking a holistic solution for VVC transform coding. In this paper, we presents a groundbreaking transform accelerator that unifies transform kernel selection and multiple transform circuit within a single framework. In terms of algorithms, driven by mechanistic analysis, we propose a decision tree-based kernel selection algorithm that ensures both high decision accuracy and computational efficiency. Additionally, we design a transfer matrix-based approximation algorithm for Discrete Sine Transform Type-7 and a matrix decomposition-based improved computation for Discrete Cosine Transform Type-2, significantly reducing the computational complexity. On the hardware front, we implement a high-precision and area-efficient transform accelerator, which integrates highly pipelined kernel selection and transform computation architectures. With multiple reuse and parallelism strategies, the accelerator demonstrates substantial resource efficiency advantages. Experimental results reveal that the proposed accelerator achieves a circuit resource reduction of over 44% with a slight performance degradation, while maintaining processing capabilities up to 8K@57 fps. To the best of our knowledge, this is the first comprehensive hardware solution for VVC transform coding that jointly addresses the challenges of kernel selection and transform circuit design.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 6\",\"pages\":\"2726-2739\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10902513/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10902513/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A Novel Transform Accelerator With Fast Kernel Selection and Efficient Transform Circuit
The introduction of multiple transform types into the Versatile Video Coding (VVC) standard has yielded notable encoding gains but also resulted in substantial computational burdens, posing two critical challenges for hardware implementation: fast kernel selection and efficient transform computation design. Existing studies typically address these challenges in isolation, lacking a holistic solution for VVC transform coding. In this paper, we presents a groundbreaking transform accelerator that unifies transform kernel selection and multiple transform circuit within a single framework. In terms of algorithms, driven by mechanistic analysis, we propose a decision tree-based kernel selection algorithm that ensures both high decision accuracy and computational efficiency. Additionally, we design a transfer matrix-based approximation algorithm for Discrete Sine Transform Type-7 and a matrix decomposition-based improved computation for Discrete Cosine Transform Type-2, significantly reducing the computational complexity. On the hardware front, we implement a high-precision and area-efficient transform accelerator, which integrates highly pipelined kernel selection and transform computation architectures. With multiple reuse and parallelism strategies, the accelerator demonstrates substantial resource efficiency advantages. Experimental results reveal that the proposed accelerator achieves a circuit resource reduction of over 44% with a slight performance degradation, while maintaining processing capabilities up to 8K@57 fps. To the best of our knowledge, this is the first comprehensive hardware solution for VVC transform coding that jointly addresses the challenges of kernel selection and transform circuit design.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.