VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution

IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Linlin Liu;Lele Niu;Jun Tang;Yong Ding
{"title":"VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution","authors":"Linlin Liu;Lele Niu;Jun Tang;Yong Ding","doi":"10.1109/ACCESS.2025.3529758","DOIUrl":null,"url":null,"abstract":"Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at <uri>https://github.com/aigcvsr/VSRDiff</uri>.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"11447-11462"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840194","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10840194/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at https://github.com/aigcvsr/VSRDiff.
视频超分辨率扩散模型中学习帧间时间相干
视频超分辨率(VSR)旨在从低分辨率(LR)输入中重建高质量的高分辨率(HR)视频。最近的研究通过利用VSR的生成先验来探索扩散模型(DMs)以产生真实的细节。然而,扩散模型固有的随机性对控制内容提出了重大挑战。特别是,当前基于数据的VSR方法往往忽略帧间时间相干性和面向重建的目标,导致视觉失真和时间不一致。在本文中,我们介绍了VSRDiff,这是一个基于dm的VSR框架,它强调帧间时间相干性,并采用了一种新的重建视角。具体而言,开发了帧间聚合制导(IFAG)模块,学习上下文帧间聚合制导,减轻扩散模型随机性造成的视觉失真。此外,采用渐进式重构采样(PRS)方法生成面向重构的电位,平衡图像保真度和细节丰富度。此外,利用流导潜校正(Flow-guided latent Correction, FLC)模块,通过二阶双向潜传播增强了时间一致性。在REDS4和Vid4数据集上进行的大量实验表明,VSRDiff在视觉保真度和时间一致性方面超越了现有的最先进方法,具有更逼真的细节,具有极具竞争力的VSR性能。其中,VSRDiff在LPIPS、DISTS和NIQE中在REDS4数据集上的得分最高,分别为0.1137、0.0445和2.970。结果将在https://github.com/aigcvsr/VSRDiff上公布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Access
IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍: IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信