基于纹理实现的自适应传统学习视频信号压缩框架

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-07-29 DOI:10.1016/j.jvcir.2025.104544

Alaa Zain , Trinh Man Hoang , Jinjia Zhou

{"title":"基于纹理实现的自适应传统学习视频信号压缩框架","authors":"Alaa Zain , Trinh Man Hoang , Jinjia Zhou","doi":"10.1016/j.jvcir.2025.104544","DOIUrl":null,"url":null,"abstract":"<div><div>With the explosive growth of various real-time video applications, it has been recognized that video compression is crucial for efficient data storage and transmission. In the low bit-rate scenario, the conventional video coding standards are possible to have small distortion but contain hand-crafted artifacts. Meanwhile, unlike conventional approaches, learning-based end-to-end techniques emphasize perceptual quality, which usually leads to relatively large distortion. To address this problem, this work proposes a new video compression framework with texture fulfillment (named ACLTF) by collaborating with conventional and learning-based video coding technologies. We separate and compress a video sequence to a small-portion key pack and a dominated non-key pack. On the encoder side, the key pack is compressed with low distortion and high texture information but a relatively low compression ratio by conventional learning. The non-key pack is highly compacted by applying semantic segment-based layered coding. On the decoder side, semantic-based self-enhancement and multi-frame enhancement are applied to transfer and interpolate the high-texture information from the key pack to the non-key pack. All the existing video coding systems are compatible with the proposed ACLTF. Experimental results verified that by applying ACLTF to the latest video coding standards (H.266/VVC, H.265/HEVC) and learning-based video coding, it significantly enhanced the compression results by 18.08%–47.57% BD rate over the standard HEVC in all-intra and improved by 6.08%–15.78% BD rate over the standard VVC in low delay.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104544"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive conventional-learning video signal compression framework using texture fulfillment\",\"authors\":\"Alaa Zain , Trinh Man Hoang , Jinjia Zhou\",\"doi\":\"10.1016/j.jvcir.2025.104544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the explosive growth of various real-time video applications, it has been recognized that video compression is crucial for efficient data storage and transmission. In the low bit-rate scenario, the conventional video coding standards are possible to have small distortion but contain hand-crafted artifacts. Meanwhile, unlike conventional approaches, learning-based end-to-end techniques emphasize perceptual quality, which usually leads to relatively large distortion. To address this problem, this work proposes a new video compression framework with texture fulfillment (named ACLTF) by collaborating with conventional and learning-based video coding technologies. We separate and compress a video sequence to a small-portion key pack and a dominated non-key pack. On the encoder side, the key pack is compressed with low distortion and high texture information but a relatively low compression ratio by conventional learning. The non-key pack is highly compacted by applying semantic segment-based layered coding. On the decoder side, semantic-based self-enhancement and multi-frame enhancement are applied to transfer and interpolate the high-texture information from the key pack to the non-key pack. All the existing video coding systems are compatible with the proposed ACLTF. Experimental results verified that by applying ACLTF to the latest video coding standards (H.266/VVC, H.265/HEVC) and learning-based video coding, it significantly enhanced the compression results by 18.08%–47.57% BD rate over the standard HEVC in all-intra and improved by 6.08%–15.78% BD rate over the standard VVC in low delay.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"111 \",\"pages\":\"Article 104544\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325001580\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001580","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着各种实时视频应用的爆炸式增长，人们已经认识到视频压缩对于有效的数据存储和传输至关重要。在低比特率情况下，传统的视频编码标准可能具有较小的失真，但包含手工制作的伪影。同时，与传统方法不同，基于学习的端到端技术强调感知质量，这通常会导致相对较大的失真。为了解决这个问题，本研究通过与传统和基于学习的视频编码技术合作，提出了一种新的具有纹理实现的视频压缩框架（命名为ACLTF）。我们将视频序列分离并压缩为一小部分密钥包和占主导地位的非密钥包。在编码器端，通过传统的学习，压缩密钥包具有低失真和高纹理信息，但压缩比相对较低。通过应用基于语义段的分层编码，实现了非密钥包的高度压缩。在解码端，采用基于语义的自增强和多帧增强技术将高纹理信息从密钥包传输和插值到非密钥包。所有现有的视频编码系统都与拟议的ACLTF兼容。实验结果证明，将ACLTF应用于最新的视频编码标准（H.266/VVC、H.265/HEVC）和基于学习的视频编码，在全帧内比标准HEVC压缩效果显著提高18.08% ~ 47.57%的BD率，在低延迟情况下比标准VVC压缩效果显著提高6.08% ~ 15.78%的BD率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive conventional-learning video signal compression framework using texture fulfillment

With the explosive growth of various real-time video applications, it has been recognized that video compression is crucial for efficient data storage and transmission. In the low bit-rate scenario, the conventional video coding standards are possible to have small distortion but contain hand-crafted artifacts. Meanwhile, unlike conventional approaches, learning-based end-to-end techniques emphasize perceptual quality, which usually leads to relatively large distortion. To address this problem, this work proposes a new video compression framework with texture fulfillment (named ACLTF) by collaborating with conventional and learning-based video coding technologies. We separate and compress a video sequence to a small-portion key pack and a dominated non-key pack. On the encoder side, the key pack is compressed with low distortion and high texture information but a relatively low compression ratio by conventional learning. The non-key pack is highly compacted by applying semantic segment-based layered coding. On the decoder side, semantic-based self-enhancement and multi-frame enhancement are applied to transfer and interpolate the high-texture information from the key pack to the non-key pack. All the existing video coding systems are compatible with the proposed ACLTF. Experimental results verified that by applying ACLTF to the latest video coding standards (H.266/VVC, H.265/HEVC) and learning-based video coding, it significantly enhanced the compression results by 18.08%–47.57% BD rate over the standard HEVC in all-intra and improved by 6.08%–15.78% BD rate over the standard VVC in low delay.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.