Video diffusion generation: comprehensive review and open problems

IF 13.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wenping Ma, Xiaoting Yang, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuting Yang, Mengru Ma, Long Sun, Ruohan Zhang, Xueli Geng, Yuwei Guo, Shuyuan Yang, Zhixi Feng
{"title":"Video diffusion generation: comprehensive review and open problems","authors":"Wenping Ma,&nbsp;Xiaoting Yang,&nbsp;Licheng Jiao,&nbsp;Lingling Li,&nbsp;Xu Liu,&nbsp;Fang Liu,&nbsp;Puhua Chen,&nbsp;Yuting Yang,&nbsp;Mengru Ma,&nbsp;Long Sun,&nbsp;Ruohan Zhang,&nbsp;Xueli Geng,&nbsp;Yuwei Guo,&nbsp;Shuyuan Yang,&nbsp;Zhixi Feng","doi":"10.1007/s10462-025-11331-6","DOIUrl":null,"url":null,"abstract":"<div><p>Video generation has become an increasingly important component of AI-generated content (AIGC), owing to its rich semantic expressiveness and growing application potential. Among various generative paradigms, diffusion models have recently gained prominence due to their strong controllability, competitive visual quality, and compatibility with multimodal inputs. However, most existing surveys provide limited coverage of diffusion-based video generation, often lacking systematic analysis and comprehensive comparisons. To address this gap, this paper presents a thorough and structured review of diffusion models for video generation. We first outline the theoretical foundations and core architectures of diffusion models, and then the key design principles of representative methods for video generation were introduced. We propose a unified taxonomy that categorizes over two hundred methods, analyzing their key characteristics, strengths, and limitations. In addition, we compared the performance of classical methods and summarized commonly used datasets and evaluation metrics in this field for ease of model benchmarking and selection. Finally, we discuss open problems and future research directions, aiming to provide a valuable reference for both academic research and practical development.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 11","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11331-6.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11331-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Video generation has become an increasingly important component of AI-generated content (AIGC), owing to its rich semantic expressiveness and growing application potential. Among various generative paradigms, diffusion models have recently gained prominence due to their strong controllability, competitive visual quality, and compatibility with multimodal inputs. However, most existing surveys provide limited coverage of diffusion-based video generation, often lacking systematic analysis and comprehensive comparisons. To address this gap, this paper presents a thorough and structured review of diffusion models for video generation. We first outline the theoretical foundations and core architectures of diffusion models, and then the key design principles of representative methods for video generation were introduced. We propose a unified taxonomy that categorizes over two hundred methods, analyzing their key characteristics, strengths, and limitations. In addition, we compared the performance of classical methods and summarized commonly used datasets and evaluation metrics in this field for ease of model benchmarking and selection. Finally, we discuss open problems and future research directions, aiming to provide a valuable reference for both academic research and practical development.

视频扩散生成:全面回顾和开放性问题
视频生成由于其丰富的语义表达能力和日益增长的应用潜力,已成为人工智能生成内容(AI-generated content, AIGC)日益重要的组成部分。在各种生成范式中,扩散模型由于其强大的可控性、具有竞争力的视觉质量以及与多模态输入的兼容性而近年来得到了突出的研究。然而,现有的大多数调查对基于扩散的视频生成的覆盖范围有限,往往缺乏系统的分析和全面的比较。为了解决这一差距,本文对视频生成的扩散模型进行了全面和结构化的回顾。我们首先概述了扩散模型的理论基础和核心架构,然后介绍了典型视频生成方法的关键设计原则。我们提出了一个统一的分类法,对200多种方法进行了分类,分析了它们的主要特征、优势和局限性。此外,我们比较了经典方法的性能,并总结了该领域常用的数据集和评估指标,以便于模型的基准测试和选择。最后,对尚待解决的问题和未来的研究方向进行了讨论,旨在为学术研究和实践发展提供有价值的参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信