文本到视频生成的综合研究

IF 3 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Chinese Journal of Electronics Pub Date : 2025-07-01 DOI:10.23919/cje.2024.00.151

Fan Xie;Dan Zeng;Qiaomu Shen;Bo Tang

{"title":"文本到视频生成的综合研究","authors":"Fan Xie;Dan Zeng;Qiaomu Shen;Bo Tang","doi":"10.23919/cje.2024.00.151","DOIUrl":null,"url":null,"abstract":"Since the release of Sora, the text-to-video (T2V) generation has brought profound changes to artificial intelligence-generated content. T2V generation aims to generate high-quality videos based on a given text description, which is challenging due to the lack of large-scale, high-quality text-video pairs for training and the complexity of modeling high-dimensional video data. Although there have been some valuable and impressive surveys on T2V generation, these surveys introduce approaches in a relatively isolated way, lack the development of evaluation metrics, and lack the latest advances in T2V generation since 2023. Due to the rapid expansion of the field of T2V generation, a comprehensive review of the relevant studies is both necessary and challenging. This survey attempts to connect and systematize existing research in a comprehensive way. Unlike previous surveys, this survey reviews nearly one hundred representative T2V generation approaches and includes the latest method published on July 2024 from the perspectives of model, data, evaluation metrics, and available open source. It may help readers better understand the current research status and ideas and have a quick start with accessible open-source models. Finally, the future challenges and method trends of T2V generation are thoroughly discussed.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"34 4","pages":"1009-1036"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11151234","citationCount":"0","resultStr":"{\"title\":\"A Comprehensive Survey on Text-to-Video Generation\",\"authors\":\"Fan Xie;Dan Zeng;Qiaomu Shen;Bo Tang\",\"doi\":\"10.23919/cje.2024.00.151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the release of Sora, the text-to-video (T2V) generation has brought profound changes to artificial intelligence-generated content. T2V generation aims to generate high-quality videos based on a given text description, which is challenging due to the lack of large-scale, high-quality text-video pairs for training and the complexity of modeling high-dimensional video data. Although there have been some valuable and impressive surveys on T2V generation, these surveys introduce approaches in a relatively isolated way, lack the development of evaluation metrics, and lack the latest advances in T2V generation since 2023. Due to the rapid expansion of the field of T2V generation, a comprehensive review of the relevant studies is both necessary and challenging. This survey attempts to connect and systematize existing research in a comprehensive way. Unlike previous surveys, this survey reviews nearly one hundred representative T2V generation approaches and includes the latest method published on July 2024 from the perspectives of model, data, evaluation metrics, and available open source. It may help readers better understand the current research status and ideas and have a quick start with accessible open-source models. Finally, the future challenges and method trends of T2V generation are thoroughly discussed.\",\"PeriodicalId\":50701,\"journal\":{\"name\":\"Chinese Journal of Electronics\",\"volume\":\"34 4\",\"pages\":\"1009-1036\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11151234\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11151234/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11151234/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

自Sora发布以来，文本到视频（T2V）一代给人工智能生成的内容带来了深刻的变化。T2V生成旨在基于给定的文本描述生成高质量的视频，由于缺乏用于训练的大规模、高质量的文本视频对以及高维视频数据建模的复杂性，这一目标具有挑战性。虽然已经有一些关于T2V生成的有价值和令人印象深刻的调查，但这些调查以相对孤立的方式引入方法，缺乏评估指标的发展，并且缺乏2023年以来T2V生成的最新进展。由于T2V生成领域的迅速扩大，对相关研究进行全面的回顾既是必要的，也是具有挑战性的。这项调查试图以一种全面的方式将现有的研究联系起来并系统化。与以往的调查不同，本次调查回顾了近100种代表性的T2V生成方法，并从模型、数据、评估指标和可用开源的角度包括了2024年7月发布的最新方法。它可以帮助读者更好地理解当前的研究状态和思想，并快速开始使用可访问的开源模型。最后，深入讨论了T2V生成的未来挑战和方法趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comprehensive Survey on Text-to-Video Generation

Since the release of Sora, the text-to-video (T2V) generation has brought profound changes to artificial intelligence-generated content. T2V generation aims to generate high-quality videos based on a given text description, which is challenging due to the lack of large-scale, high-quality text-video pairs for training and the complexity of modeling high-dimensional video data. Although there have been some valuable and impressive surveys on T2V generation, these surveys introduce approaches in a relatively isolated way, lack the development of evaluation metrics, and lack the latest advances in T2V generation since 2023. Due to the rapid expansion of the field of T2V generation, a comprehensive review of the relevant studies is both necessary and challenging. This survey attempts to connect and systematize existing research in a comprehensive way. Unlike previous surveys, this survey reviews nearly one hundred representative T2V generation approaches and includes the latest method published on July 2024 from the perspectives of model, data, evaluation metrics, and available open source. It may help readers better understand the current research status and ideas and have a quick start with accessible open-source models. Finally, the future challenges and method trends of T2V generation are thoroughly discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chinese Journal of Electronics 工程技术-工程：电子与电气

CiteScore

3.70

自引率

16.70%

发文量

342

审稿时长

12.0 months

期刊介绍： CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.