AMG：阿凡达动作引导视频生成器

arXiv - CS - Graphics Pub Date : 2024-09-02 DOI:arxiv-2409.01502

Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang

{"title":"AMG：阿凡达动作引导视频生成器","authors":"Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang","doi":"arxiv-2409.01502","DOIUrl":null,"url":null,"abstract":"Human video generation task has gained significant attention with the\nadvancement of deep generative models. Generating realistic videos with human\nmovements is challenging in nature, due to the intricacies of human body\ntopology and sensitivity to visual artifacts. The extensively studied 2D media\ngeneration methods take advantage of massive human media datasets, but struggle\nwith 3D-aware control; whereas 3D avatar-based approaches, while offering more\nfreedom in control, lack photorealism and cannot be harmonized seamlessly with\nbackground scene. We propose AMG, a method that combines the 2D photorealism\nand 3D controllability by conditioning video diffusion models on controlled\nrendering of 3D avatars. We additionally introduce a novel data processing\npipeline that reconstructs and renders human avatar movements from dynamic\ncamera videos. AMG is the first method that enables multi-person diffusion\nvideo generation with precise control over camera positions, human motions, and\nbackground style. We also demonstrate through extensive evaluation that it\noutperforms existing human video generation methods conditioned on pose\nsequences or driving videos in terms of realism and adaptability.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"136 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AMG: Avatar Motion Guided Video Generation\",\"authors\":\"Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang\",\"doi\":\"arxiv-2409.01502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human video generation task has gained significant attention with the\\nadvancement of deep generative models. Generating realistic videos with human\\nmovements is challenging in nature, due to the intricacies of human body\\ntopology and sensitivity to visual artifacts. The extensively studied 2D media\\ngeneration methods take advantage of massive human media datasets, but struggle\\nwith 3D-aware control; whereas 3D avatar-based approaches, while offering more\\nfreedom in control, lack photorealism and cannot be harmonized seamlessly with\\nbackground scene. We propose AMG, a method that combines the 2D photorealism\\nand 3D controllability by conditioning video diffusion models on controlled\\nrendering of 3D avatars. We additionally introduce a novel data processing\\npipeline that reconstructs and renders human avatar movements from dynamic\\ncamera videos. AMG is the first method that enables multi-person diffusion\\nvideo generation with precise control over camera positions, human motions, and\\nbackground style. We also demonstrate through extensive evaluation that it\\noutperforms existing human video generation methods conditioned on pose\\nsequences or driving videos in terms of realism and adaptability.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"136 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着深度生成模型的发展，人类视频生成任务获得了极大关注。由于人体结构的复杂性和对视觉伪影的敏感性，生成逼真的人体动作视频本质上具有挑战性。已被广泛研究的二维媒体生成方法利用了海量人类媒体数据集的优势，但在三维感知控制方面却举步维艰；而基于三维头像的方法虽然提供了更大的控制自由度，但却缺乏逼真度，无法与背景场景无缝协调。我们提出了 AMG 方法，通过在三维头像的控制渲染中调节视频扩散模型，将二维逼真度和三维可控性结合起来。此外，我们还引入了一种新颖的数据处理管道，可从动态摄像机视频中重建和渲染人类头像的动作。AMG 是第一种能精确控制摄像机位置、人体运动和背景风格的多人扩散视频生成方法。我们还通过广泛的评估证明，AMG 在逼真度和适应性方面优于现有的以位置序列或驾驶视频为条件的人体视频生成方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AMG: Avatar Motion Guided Video Generation

Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware control; whereas 3D avatar-based approaches, while offering more freedom in control, lack photorealism and cannot be harmonized seamlessly with background scene. We propose AMG, a method that combines the 2D photorealism and 3D controllability by conditioning video diffusion models on controlled rendering of 3D avatars. We additionally introduce a novel data processing pipeline that reconstructs and renders human avatar movements from dynamic camera videos. AMG is the first method that enables multi-person diffusion video generation with precise control over camera positions, human motions, and background style. We also demonstrate through extensive evaluation that it outperforms existing human video generation methods conditioned on pose sequences or driving videos in terms of realism and adaptability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Graphics

自引率

0.00%

发文量