Video diffusion generation: comprehensive review and open problems

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-08-20 DOI:10.1007/s10462-025-11331-6

Wenping Ma, Xiaoting Yang, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuting Yang, Mengru Ma, Long Sun, Ruohan Zhang, Xueli Geng, Yuwei Guo, Shuyuan Yang, Zhixi Feng

{"title":"Video diffusion generation: comprehensive review and open problems","authors":"Wenping Ma, Xiaoting Yang, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuting Yang, Mengru Ma, Long Sun, Ruohan Zhang, Xueli Geng, Yuwei Guo, Shuyuan Yang, Zhixi Feng","doi":"10.1007/s10462-025-11331-6","DOIUrl":null,"url":null,"abstract":"<div><p>Video generation has become an increasingly important component of AI-generated content (AIGC), owing to its rich semantic expressiveness and growing application potential. Among various generative paradigms, diffusion models have recently gained prominence due to their strong controllability, competitive visual quality, and compatibility with multimodal inputs. However, most existing surveys provide limited coverage of diffusion-based video generation, often lacking systematic analysis and comprehensive comparisons. To address this gap, this paper presents a thorough and structured review of diffusion models for video generation. We first outline the theoretical foundations and core architectures of diffusion models, and then the key design principles of representative methods for video generation were introduced. We propose a unified taxonomy that categorizes over two hundred methods, analyzing their key characteristics, strengths, and limitations. In addition, we compared the performance of classical methods and summarized commonly used datasets and evaluation metrics in this field for ease of model benchmarking and selection. Finally, we discuss open problems and future research directions, aiming to provide a valuable reference for both academic research and practical development.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 11","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11331-6.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11331-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video generation has become an increasingly important component of AI-generated content (AIGC), owing to its rich semantic expressiveness and growing application potential. Among various generative paradigms, diffusion models have recently gained prominence due to their strong controllability, competitive visual quality, and compatibility with multimodal inputs. However, most existing surveys provide limited coverage of diffusion-based video generation, often lacking systematic analysis and comprehensive comparisons. To address this gap, this paper presents a thorough and structured review of diffusion models for video generation. We first outline the theoretical foundations and core architectures of diffusion models, and then the key design principles of representative methods for video generation were introduced. We propose a unified taxonomy that categorizes over two hundred methods, analyzing their key characteristics, strengths, and limitations. In addition, we compared the performance of classical methods and summarized commonly used datasets and evaluation metrics in this field for ease of model benchmarking and selection. Finally, we discuss open problems and future research directions, aiming to provide a valuable reference for both academic research and practical development.

查看原文本刊更多论文

视频扩散生成：全面回顾和开放性问题

视频生成由于其丰富的语义表达能力和日益增长的应用潜力，已成为人工智能生成内容（AI-generated content， AIGC）日益重要的组成部分。在各种生成范式中，扩散模型由于其强大的可控性、具有竞争力的视觉质量以及与多模态输入的兼容性而近年来得到了突出的研究。然而，现有的大多数调查对基于扩散的视频生成的覆盖范围有限，往往缺乏系统的分析和全面的比较。为了解决这一差距，本文对视频生成的扩散模型进行了全面和结构化的回顾。我们首先概述了扩散模型的理论基础和核心架构，然后介绍了典型视频生成方法的关键设计原则。我们提出了一个统一的分类法，对200多种方法进行了分类，分析了它们的主要特征、优势和局限性。此外，我们比较了经典方法的性能，并总结了该领域常用的数据集和评估指标，以便于模型的基准测试和选择。最后，对尚待解决的问题和未来的研究方向进行了讨论，旨在为学术研究和实践发展提供有价值的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.