Deep video compression based on Long-range Temporal Context Learning

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Kejun Wu , Zhenxing Li , You Yang, Qiong Liu
{"title":"Deep video compression based on Long-range Temporal Context Learning","authors":"Kejun Wu ,&nbsp;Zhenxing Li ,&nbsp;You Yang,&nbsp;Qiong Liu","doi":"10.1016/j.cviu.2024.104127","DOIUrl":null,"url":null,"abstract":"<div><p>Video compression allows for efficient storage and transmission of data, benefiting imaging and vision applications, e.g. computational imaging, photography, and displays by delivering high-quality videos. To exploit more informative contexts of video, we propose DVCL, a novel <strong>D</strong>eep <strong>V</strong>ideo <strong>C</strong>ompression based on <strong>L</strong>ong-range Temporal Context Learning. Aiming at high coding performance, this new compression paradigm makes full use of long-range temporal correlations derived from multiple reference frames to learn richer contexts. Motion vectors (MVs) are estimated to represent the motion relations of videos. By employing MVs, a long-range temporal context learning (LTCL) module is presented to extract context information from multiple reference frames, such that a more accurate and informative temporal contexts can be learned and constructed. The long-range temporal contexts serve as conditions and generate the predicted frames by contextual encoder and decoder. To address the challenge of imbalanced training, we develop a multi-stage training strategy to ensure the whole DVCL framework is trained progressively and stably. Extensive experiments demonstrate the proposed DVCL achieves the highest objective and subjective quality, while maintaining relatively low complexity. Specifically, 25.30% and 45.75% bitrate savings on average can be obtained than x265 codec at the same PSNR and MS-SSIM, respectively.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422400208X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Video compression allows for efficient storage and transmission of data, benefiting imaging and vision applications, e.g. computational imaging, photography, and displays by delivering high-quality videos. To exploit more informative contexts of video, we propose DVCL, a novel Deep Video Compression based on Long-range Temporal Context Learning. Aiming at high coding performance, this new compression paradigm makes full use of long-range temporal correlations derived from multiple reference frames to learn richer contexts. Motion vectors (MVs) are estimated to represent the motion relations of videos. By employing MVs, a long-range temporal context learning (LTCL) module is presented to extract context information from multiple reference frames, such that a more accurate and informative temporal contexts can be learned and constructed. The long-range temporal contexts serve as conditions and generate the predicted frames by contextual encoder and decoder. To address the challenge of imbalanced training, we develop a multi-stage training strategy to ensure the whole DVCL framework is trained progressively and stably. Extensive experiments demonstrate the proposed DVCL achieves the highest objective and subjective quality, while maintaining relatively low complexity. Specifically, 25.30% and 45.75% bitrate savings on average can be obtained than x265 codec at the same PSNR and MS-SSIM, respectively.

基于长程时态上下文学习的深度视频压缩
视频压缩可以高效地存储和传输数据,通过提供高质量的视频,使成像和视觉应用(如计算成像、摄影和显示)受益匪浅。为了利用视频中的更多信息,我们提出了基于长时域上下文学习的新型深度视频压缩技术 DVCL。为了实现高编码性能,这种新的压缩范式充分利用了从多个参考帧中获得的长时相关性来学习更丰富的上下文。通过估算运动矢量(MV)来表示视频的运动关系。通过使用运动矢量,提出了一个长时域上下文学习(LTCL)模块,以从多个参考帧中提取上下文信息,从而学习和构建更准确、更翔实的时域上下文。远程时间上下文作为条件,通过上下文编码器和解码器生成预测帧。为了应对不平衡训练的挑战,我们开发了一种多阶段训练策略,以确保整个 DVCL 框架得到渐进和稳定的训练。广泛的实验证明,所提出的 DVCL 在保持相对较低复杂度的同时,实现了最高的客观和主观质量。具体来说,在相同的 PSNR 和 MS-SSIM 条件下,比 x265 编解码器平均分别节省 25.30% 和 45.75% 的比特率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信