人体运动视频生成:综述

IF 18.6
Haiwei Xue;Xiangyang Luo;Zhanghao Hu;Xin Zhang;Xunzhi Xiang;Yuqin Dai;Jianzhuang Liu;Zhensong Zhang;Minglei Li;Jian Yang;Fei Ma;Zhiyong Wu;Changpeng Yang;Zonghong Dai;Fei Richard Yu
{"title":"人体运动视频生成:综述","authors":"Haiwei Xue;Xiangyang Luo;Zhanghao Hu;Xin Zhang;Xunzhi Xiang;Yuqin Dai;Jianzhuang Liu;Zhensong Zhang;Minglei Li;Jian Yang;Fei Ma;Zhiyong Wu;Changpeng Yang;Zonghong Dai;Fei Richard Yu","doi":"10.1109/TPAMI.2025.3594034","DOIUrl":null,"url":null,"abstract":"Human motion video generation has garnered significant research interest due to its broad applications, enabling innovations such as photorealistic singing heads or dynamic avatars that seamlessly dance to music. However, existing surveys in this field focus on individual methods, lacking a comprehensive overview of the entire generative process. This paper addresses this gap by providing an in-depth survey of human motion video generation, encompassing over ten sub-tasks, and detailing the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output. Notably, this is the first survey that discusses the potential of large language models in enhancing human motion video generation. Our survey reviews the latest developments and technological trends in human motion video generation across three primary modalities: vision, text, and audio. By covering over two hundred papers, we offer a thorough overview of the field and highlight milestone works that have driven significant technological breakthroughs. Our goal for this survey is to unveil the prospects of human motion video generation and serve as a valuable resource for advancing the comprehensive applications of digital humans.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10709-10730"},"PeriodicalIF":18.6000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human Motion Video Generation: A Survey\",\"authors\":\"Haiwei Xue;Xiangyang Luo;Zhanghao Hu;Xin Zhang;Xunzhi Xiang;Yuqin Dai;Jianzhuang Liu;Zhensong Zhang;Minglei Li;Jian Yang;Fei Ma;Zhiyong Wu;Changpeng Yang;Zonghong Dai;Fei Richard Yu\",\"doi\":\"10.1109/TPAMI.2025.3594034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human motion video generation has garnered significant research interest due to its broad applications, enabling innovations such as photorealistic singing heads or dynamic avatars that seamlessly dance to music. However, existing surveys in this field focus on individual methods, lacking a comprehensive overview of the entire generative process. This paper addresses this gap by providing an in-depth survey of human motion video generation, encompassing over ten sub-tasks, and detailing the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output. Notably, this is the first survey that discusses the potential of large language models in enhancing human motion video generation. Our survey reviews the latest developments and technological trends in human motion video generation across three primary modalities: vision, text, and audio. By covering over two hundred papers, we offer a thorough overview of the field and highlight milestone works that have driven significant technological breakthroughs. Our goal for this survey is to unveil the prospects of human motion video generation and serve as a valuable resource for advancing the comprehensive applications of digital humans.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 11\",\"pages\":\"10709-10730\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11106267/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11106267/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人体运动视频生成由于其广泛的应用而获得了重要的研究兴趣,使诸如逼真的唱歌头或动态头像等创新能够无缝地随着音乐跳舞。然而,该领域的现有调查侧重于个别方法,缺乏对整个生成过程的全面概述。本文通过提供人类运动视频生成的深入调查来解决这一差距,包括十多个子任务,并详细介绍了生成过程的五个关键阶段:输入,运动规划,运动视频生成,细化和输出。值得注意的是,这是第一次讨论大型语言模型在增强人体运动视频生成方面的潜力的调查。我们的调查回顾了人类运动视频生成的最新发展和技术趋势,涉及三种主要模式:视觉、文本和音频。通过涵盖200多篇论文,我们提供了该领域的全面概述,并突出了推动重大技术突破的里程碑式工作。我们的目标是揭示人体运动视频生成的前景,并为推进数字人体的综合应用提供宝贵的资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Human Motion Video Generation: A Survey
Human motion video generation has garnered significant research interest due to its broad applications, enabling innovations such as photorealistic singing heads or dynamic avatars that seamlessly dance to music. However, existing surveys in this field focus on individual methods, lacking a comprehensive overview of the entire generative process. This paper addresses this gap by providing an in-depth survey of human motion video generation, encompassing over ten sub-tasks, and detailing the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output. Notably, this is the first survey that discusses the potential of large language models in enhancing human motion video generation. Our survey reviews the latest developments and technological trends in human motion video generation across three primary modalities: vision, text, and audio. By covering over two hundred papers, we offer a thorough overview of the field and highlight milestone works that have driven significant technological breakthroughs. Our goal for this survey is to unveil the prospects of human motion video generation and serve as a valuable resource for advancing the comprehensive applications of digital humans.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信