具有自适应时序先验和解码运动辅助质量增强功能的学习视频压缩技术

IF 5.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jiayu Yang, Chunhui Yang, Fei Xiong, Yongqi Zhai, Ronggang Wang
{"title":"具有自适应时序先验和解码运动辅助质量增强功能的学习视频压缩技术","authors":"Jiayu Yang, Chunhui Yang, Fei Xiong, Yongqi Zhai, Ronggang Wang","doi":"10.1145/3661824","DOIUrl":null,"url":null,"abstract":"<p>Learned video compression has drawn great attention and shown promising compression performance recently. In this paper, we focus on the two components in learned video compression framework, i.e., conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion and residual compression, which introduces temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. Besides, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce bitrate cost of residual coding. The module reuses decoded optical flow as motion prior and utilizes deformable convolution to mine high-quality information from reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual-coding based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in low-delay scenario, compared with recent learning-based methods and traditional H.265/HEVC in terms of PSNR and MS-SSIM. The code is available at OpenLVC.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"50 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement\",\"authors\":\"Jiayu Yang, Chunhui Yang, Fei Xiong, Yongqi Zhai, Ronggang Wang\",\"doi\":\"10.1145/3661824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Learned video compression has drawn great attention and shown promising compression performance recently. In this paper, we focus on the two components in learned video compression framework, i.e., conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion and residual compression, which introduces temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. Besides, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce bitrate cost of residual coding. The module reuses decoded optical flow as motion prior and utilizes deformable convolution to mine high-quality information from reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual-coding based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in low-delay scenario, compared with recent learning-based methods and traditional H.265/HEVC in terms of PSNR and MS-SSIM. The code is available at OpenLVC.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3661824\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3661824","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

最近,学习视频压缩引起了广泛关注,并显示出良好的压缩性能。在本文中,我们将重点关注学习视频压缩框架中的两个组件,即条件熵模型和质量增强模块,以提高压缩性能。具体来说,我们提出了一种用于图像、运动和残差压缩的自适应时空熵模型,该模型引入了时间先验来减少潜变量的时间冗余,并引入了一个额外的调制掩码来评估相似性并进行细化。此外,还针对预测帧和重建帧提出了质量增强模块,以提高帧质量并降低残差编码的比特率成本。该模块重新使用解码光流作为运动先验,并利用可变形卷积以无比特方式从参考帧中挖掘高质量信息。为了评估这两种编码工具的有效性,我们将它们集成到一个基于像素域残差编码的压缩框架中。实验结果表明,就 PSNR 和 MS-SSIM 而言,与最新的基于学习的方法和传统 H.265/HEVC 相比,我们的框架在低延迟场景下实现了有竞争力的压缩性能。代码可在 OpenLVC 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Learned video compression has drawn great attention and shown promising compression performance recently. In this paper, we focus on the two components in learned video compression framework, i.e., conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion and residual compression, which introduces temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. Besides, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce bitrate cost of residual coding. The module reuses decoded optical flow as motion prior and utilizes deformable convolution to mine high-quality information from reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual-coding based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in low-delay scenario, compared with recent learning-based methods and traditional H.265/HEVC in terms of PSNR and MS-SSIM. The code is available at OpenLVC.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.50
自引率
5.90%
发文量
285
审稿时长
7.5 months
期刊介绍: The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信