在单目视频中转移人体运动和外观

Thiago L. Gomes, Renato Martins, Erickson R. Nascimento
{"title":"在单目视频中转移人体运动和外观","authors":"Thiago L. Gomes, Renato Martins, Erickson R. Nascimento","doi":"10.5753/sibgrapi.est.2022.23256","DOIUrl":null,"url":null,"abstract":"This thesis investigates the problem of transferring human motion and appearance from video to video preserving motion features, body shape, and visual quality. In other words, given two input videos, we investigate how to synthesize a new video, where a target person from the first video is placed into a new context performing different motions from the second video. Possible application domains are on graphics animations and entertainment media that rely on synthetic characters and virtual environments to create visual content. We introduce two novel methods for transferring appearance and retargeting human motion from monocular videos, and by consequence, increase the creative possibilities of visual content. Differently from recent appearance transferring methods, our approaches take into account 3D shape, appearance, and motion constraints. Specifically, our first method is based on a hybrid image-based rendering technique that exhibits competitive visual retargeting quality compared to state-of-the-art neural rendering approaches, even without computationally intensive training. Then, inspired by the advantages of the first method, we designed an end-to-end learning-based transferring strategy. Taking advantages of both differentiable rendering and the 3D parametric model, our second data-driven method produces a fully 3D controllable human model, i.e., the user can control the human pose and rendering parameters. Experiments on different videos show that our methods preserve specific features of the motion that must be maintained (e.g., feet touching the floor, hands touching a particular object) while holding the best values for appearance in terms of Structural Similarity (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), Mean Squared Error (MSE), and Fréchet Video Distance (FVD). We also provide to the community a new dataset composed of several annotated videos with motion constraints for retargeting applications and paired motion sequences from different characters to evaluate transferring approaches.","PeriodicalId":182158,"journal":{"name":"Anais Estendidos do XXXV Conference on Graphics, Patterns and Images (SIBGRAPI Estendido 2022)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transferring Human Motion and Appearance in Monocular Videos\",\"authors\":\"Thiago L. Gomes, Renato Martins, Erickson R. Nascimento\",\"doi\":\"10.5753/sibgrapi.est.2022.23256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This thesis investigates the problem of transferring human motion and appearance from video to video preserving motion features, body shape, and visual quality. In other words, given two input videos, we investigate how to synthesize a new video, where a target person from the first video is placed into a new context performing different motions from the second video. Possible application domains are on graphics animations and entertainment media that rely on synthetic characters and virtual environments to create visual content. We introduce two novel methods for transferring appearance and retargeting human motion from monocular videos, and by consequence, increase the creative possibilities of visual content. Differently from recent appearance transferring methods, our approaches take into account 3D shape, appearance, and motion constraints. Specifically, our first method is based on a hybrid image-based rendering technique that exhibits competitive visual retargeting quality compared to state-of-the-art neural rendering approaches, even without computationally intensive training. Then, inspired by the advantages of the first method, we designed an end-to-end learning-based transferring strategy. Taking advantages of both differentiable rendering and the 3D parametric model, our second data-driven method produces a fully 3D controllable human model, i.e., the user can control the human pose and rendering parameters. Experiments on different videos show that our methods preserve specific features of the motion that must be maintained (e.g., feet touching the floor, hands touching a particular object) while holding the best values for appearance in terms of Structural Similarity (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), Mean Squared Error (MSE), and Fréchet Video Distance (FVD). We also provide to the community a new dataset composed of several annotated videos with motion constraints for retargeting applications and paired motion sequences from different characters to evaluate transferring approaches.\",\"PeriodicalId\":182158,\"journal\":{\"name\":\"Anais Estendidos do XXXV Conference on Graphics, Patterns and Images (SIBGRAPI Estendido 2022)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais Estendidos do XXXV Conference on Graphics, Patterns and Images (SIBGRAPI Estendido 2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/sibgrapi.est.2022.23256\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais Estendidos do XXXV Conference on Graphics, Patterns and Images (SIBGRAPI Estendido 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/sibgrapi.est.2022.23256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了将人体运动和外观从视频转移到视频的问题,以保持运动特征,身体形状和视觉质量。换句话说,给定两个输入视频,我们研究如何合成一个新视频,其中第一个视频中的目标人物被放置到一个新的环境中,执行与第二个视频不同的动作。可能的应用领域是依赖合成角色和虚拟环境来创建视觉内容的图形动画和娱乐媒体。我们介绍了从单目视频中转移外观和重新定位人体运动的两种新方法,从而增加了视觉内容的创造性可能性。与最近的外观转移方法不同,我们的方法考虑了3D形状,外观和运动约束。具体来说,我们的第一种方法是基于混合图像的渲染技术,与最先进的神经渲染方法相比,即使没有密集的计算训练,也能显示出具有竞争力的视觉重定向质量。然后,受第一种方法优点的启发,我们设计了一种基于端到端学习的迁移策略。我们的第二种数据驱动方法利用了可微渲染和三维参数化模型的优势,生成了一个完全三维可控的人体模型,即用户可以控制人体姿态和渲染参数。对不同视频的实验表明,我们的方法保留了必须保持的运动的特定特征(例如,脚接触地板,手接触特定物体),同时在结构相似度(SSIM),学习感知图像补丁相似度(LPIPS),均方误差(MSE)和fr视频距离(FVD)方面保持了最佳的外观值。我们还向社区提供了一个新的数据集,该数据集由几个带有运动约束的注释视频组成,用于重定向应用,并对来自不同角色的运动序列进行配对,以评估转移方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Transferring Human Motion and Appearance in Monocular Videos
This thesis investigates the problem of transferring human motion and appearance from video to video preserving motion features, body shape, and visual quality. In other words, given two input videos, we investigate how to synthesize a new video, where a target person from the first video is placed into a new context performing different motions from the second video. Possible application domains are on graphics animations and entertainment media that rely on synthetic characters and virtual environments to create visual content. We introduce two novel methods for transferring appearance and retargeting human motion from monocular videos, and by consequence, increase the creative possibilities of visual content. Differently from recent appearance transferring methods, our approaches take into account 3D shape, appearance, and motion constraints. Specifically, our first method is based on a hybrid image-based rendering technique that exhibits competitive visual retargeting quality compared to state-of-the-art neural rendering approaches, even without computationally intensive training. Then, inspired by the advantages of the first method, we designed an end-to-end learning-based transferring strategy. Taking advantages of both differentiable rendering and the 3D parametric model, our second data-driven method produces a fully 3D controllable human model, i.e., the user can control the human pose and rendering parameters. Experiments on different videos show that our methods preserve specific features of the motion that must be maintained (e.g., feet touching the floor, hands touching a particular object) while holding the best values for appearance in terms of Structural Similarity (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), Mean Squared Error (MSE), and Fréchet Video Distance (FVD). We also provide to the community a new dataset composed of several annotated videos with motion constraints for retargeting applications and paired motion sequences from different characters to evaluate transferring approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信