Enhancing real-time UHD intra-frame coding with parallel–serial hybrid neural networks

IF 3.4 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-03-25 DOI:10.1016/j.displa.2025.103034

Yucheng Jiang , Songping Mai , Peng Zhang , Junwei Hu , Jie Yu , Jian Cheng

{"title":"Enhancing real-time UHD intra-frame coding with parallel–serial hybrid neural networks","authors":"Yucheng Jiang , Songping Mai , Peng Zhang , Junwei Hu , Jie Yu , Jian Cheng","doi":"10.1016/j.displa.2025.103034","DOIUrl":null,"url":null,"abstract":"<div><div>The primary objective of a video encoder is to achieve both high real-time performance and a high compression ratio. Delivering these capabilities in a cost-effective hardware environment is crucial for practical applications. Numerous institutions have developed highly-optimized implementations for the mainstream video coding standards, such as x265 for HEVC, VVenC for VVC, and uAVS3e for AVS3. However, these implementations are still not capable of performing real-time encoding of 4K/8K UHD videos without significantly reducing compression complexity. This paper presents a parallel–serial hybrid neural network scheme, specifically tailored to expedite intra-frame block partitioning decisions. The parallel network is designed to extract effective features while minimizing the impact of network inference time. Simultaneously, the lightweight serial network effectively overcomes accuracy issue related to the data dependency introduced by the reconstructed pixels. The proposed enhancement scheme is integrated into the uAVS3 Real-Time encoder. The experimental results for CTC 4K UHD sequences demonstrate a significant increase in encoding speed (+30.2%) and an improvement in encoding quality, as evidenced by a 0.24% reduction in BD-BR. Compared to the previous work, we achieve the optimal trade-off in these two critical metrics. Furthermore, we integrated the enhanced encoder into the FFmpeg framework, enabling an efficient video encoding system capable of achieving 4K@50FPS and 8K@9FPS on affordable hardware configurations.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103034"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S014193822500071X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The primary objective of a video encoder is to achieve both high real-time performance and a high compression ratio. Delivering these capabilities in a cost-effective hardware environment is crucial for practical applications. Numerous institutions have developed highly-optimized implementations for the mainstream video coding standards, such as x265 for HEVC, VVenC for VVC, and uAVS3e for AVS3. However, these implementations are still not capable of performing real-time encoding of 4K/8K UHD videos without significantly reducing compression complexity. This paper presents a parallel–serial hybrid neural network scheme, specifically tailored to expedite intra-frame block partitioning decisions. The parallel network is designed to extract effective features while minimizing the impact of network inference time. Simultaneously, the lightweight serial network effectively overcomes accuracy issue related to the data dependency introduced by the reconstructed pixels. The proposed enhancement scheme is integrated into the uAVS3 Real-Time encoder. The experimental results for CTC 4K UHD sequences demonstrate a significant increase in encoding speed (+30.2%) and an improvement in encoding quality, as evidenced by a 0.24% reduction in BD-BR. Compared to the previous work, we achieve the optimal trade-off in these two critical metrics. Furthermore, we integrated the enhanced encoder into the FFmpeg framework, enabling an efficient video encoding system capable of achieving 4K@50FPS and 8K@9FPS on affordable hardware configurations.

Abstract Image

查看原文本刊更多论文

并行串行混合神经网络增强实时超高清帧内编码

视频编码器的主要目标是实现高实时性和高压缩比。在经济高效的硬件环境中交付这些功能对于实际应用程序至关重要。许多机构已经针对主流视频编码标准开发了高度优化的实现，例如HEVC的x265， VVC的VVenC， AVS3的uAVS3e。然而，这些实现仍然不能在没有显著降低压缩复杂性的情况下执行4K/8K UHD视频的实时编码。本文提出了一种并行-串行混合神经网络方案，专门用于加快帧内块划分决策。该并行网络旨在提取有效的特征，同时最小化网络推理时间的影响。同时，轻量级串行网络有效地克服了重构像素所带来的数据依赖带来的精度问题。该增强方案被集成到uAVS3实时编码器中。实验结果表明，CTC 4K UHD序列的编码速度显著提高（+30.2%），编码质量得到改善，其中BD-BR降低了0.24%。与之前的工作相比，我们在这两个关键指标中实现了最优权衡。此外，我们将增强的编码器集成到FFmpeg框架中，使高效的视频编码系统能够在负担得起的硬件配置上实现4K@50FPS和8K@9FPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.