FSDM: An efficient video super-resolution method based on Frames-Shift Diffusion Model

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-04-03 DOI:10.1016/j.neunet.2025.107435

Shijie Yang , Chao Chen , Jie Liu , Jie Tang , Gangshan Wu

{"title":"FSDM: An efficient video super-resolution method based on Frames-Shift Diffusion Model","authors":"Shijie Yang , Chao Chen , Jie Liu , Jie Tang , Gangshan Wu","doi":"10.1016/j.neunet.2025.107435","DOIUrl":null,"url":null,"abstract":"<div><div>Video super-resolution is a fundamental task aimed at enhancing video quality through intricate modeling techniques. Recent advancements in diffusion models have significantly enhanced image super-resolution processing capabilities. However, their integration into video super-resolution workflows remains constrained due to the computational complexity of temporal fusion modules, demanding more computational resources compared to their image counterparts. To address this challenge, we propose a novel approach: a Frames-Shift Diffusion Model based on the image diffusion models. Compared to directly training diffusion-based video super-resolution models, redesigning the diffusion process of image models without introducing complex temporal modules requires minimal training consumption. We incorporate temporal information into the image super-resolution diffusion model by using optical flow and perform multi-frame fusion. This model adapts the diffusion process to smoothly transition from image super-resolution to video super-resolution diffusion without additional weight parameters. As a result, the Frames-Shift Diffusion Model efficiently processes videos frame by frame while maintaining computational efficiency and achieving superior performance. It enhances perceptual quality and achieves comparable performance to other state-of-the-art diffusion-based VSR methods in PSNR and SSIM. This approach optimizes video super-resolution by simplifying the integration of temporal data, thus addressing key challenges in the field.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107435"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025003144","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video super-resolution is a fundamental task aimed at enhancing video quality through intricate modeling techniques. Recent advancements in diffusion models have significantly enhanced image super-resolution processing capabilities. However, their integration into video super-resolution workflows remains constrained due to the computational complexity of temporal fusion modules, demanding more computational resources compared to their image counterparts. To address this challenge, we propose a novel approach: a Frames-Shift Diffusion Model based on the image diffusion models. Compared to directly training diffusion-based video super-resolution models, redesigning the diffusion process of image models without introducing complex temporal modules requires minimal training consumption. We incorporate temporal information into the image super-resolution diffusion model by using optical flow and perform multi-frame fusion. This model adapts the diffusion process to smoothly transition from image super-resolution to video super-resolution diffusion without additional weight parameters. As a result, the Frames-Shift Diffusion Model efficiently processes videos frame by frame while maintaining computational efficiency and achieving superior performance. It enhances perceptual quality and achieves comparable performance to other state-of-the-art diffusion-based VSR methods in PSNR and SSIM. This approach optimizes video super-resolution by simplifying the integration of temporal data, thus addressing key challenges in the field.

查看原文本刊更多论文

FSDM：一种基于帧移扩散模型的高效视频超分辨方法

视频超分辨率是通过复杂的建模技术来提高视频质量的一项基本任务。扩散模型的最新进展显著增强了图像超分辨率处理能力。然而，由于时间融合模块的计算复杂性，它们与视频超分辨率工作流的集成仍然受到限制，与图像相比较，需要更多的计算资源。为了解决这一挑战，我们提出了一种新的方法：基于图像扩散模型的帧移扩散模型。与直接训练基于扩散的视频超分辨率模型相比，在不引入复杂时间模块的情况下重新设计图像模型的扩散过程所需的训练消耗最小。利用光流技术将时间信息融合到图像超分辨率扩散模型中，并进行多帧融合。该模型在不需要附加权值参数的情况下，使扩散过程从图像超分辨率平滑过渡到视频超分辨率。因此，帧移扩散模型在保持计算效率的同时，有效地逐帧处理视频，并取得了优异的性能。它增强了感知质量，并在PSNR和SSIM中实现了与其他最先进的基于扩散的VSR方法相当的性能。该方法通过简化时间数据的集成来优化视频超分辨率，从而解决了该领域的关键挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.