PoseVocab:学习人体化身建模的关节结构姿势嵌入

ACM SIGGRAPH 2023 Conference Proceedings Pub Date : 2023-04-25 DOI:10.1145/3588432.3591490

Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu

{"title":"PoseVocab:学习人体化身建模的关节结构姿势嵌入","authors":"Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu","doi":"10.1145/3588432.3591490","DOIUrl":null,"url":null,"abstract":"Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling. To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in so(3) of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.","PeriodicalId":280036,"journal":{"name":"ACM SIGGRAPH 2023 Conference Proceedings","volume":"189 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling\",\"authors\":\"Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu\",\"doi\":\"10.1145/3588432.3591490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling. To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in so(3) of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.\",\"PeriodicalId\":280036,\"journal\":{\"name\":\"ACM SIGGRAPH 2023 Conference Proceedings\",\"volume\":\"189 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGGRAPH 2023 Conference Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3588432.3591490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGGRAPH 2023 Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3588432.3591490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

创建姿态驱动的人体虚拟形象是对低频驱动姿态到高频动态人体外观的映射进行建模，因此一种有效的姿态编码方法对高保真的人体细节进行编码是人体虚拟形象建模的关键。为此，我们提出了PoseVocab，一种新的姿态编码方法，鼓励网络发现最佳的姿态嵌入来学习动态的人体外观。给定角色的多视图RGB视频，PoseVocab基于训练姿势构建关键姿势和潜在嵌入。为了实现姿态泛化和时间一致性，我们在每个关节的so(3)中采样键旋转，而不是全局姿态向量，并为每个采样的键旋转分配一个姿态嵌入。这些关节结构的位姿嵌入不仅对不同关键位姿下的动态外观进行编码，而且还将全局位姿嵌入分解为关节结构的位姿嵌入，以便更好地了解与各个关节运动相关的外观变化。为了提高姿态嵌入的表示能力，同时保持记忆效率，我们引入了特征线，一种紧凑而有效的3D表示，来模拟更细粒度的人体外观细节。在给定查询姿态和空间位置的情况下，引入层次查询策略对姿态嵌入进行插值，获取动态人体合成的条件姿态特征。总的来说，PoseVocab有效地编码了人类外表的动态细节，并在新颖的姿势下实现了逼真和广义的动画。实验表明，在合成质量方面，我们的方法在定性和定量方面都优于其他最先进的基线。代码可从https://github.com/lizhe00/PoseVocab获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling

Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling. To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in so(3) of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the global pose embedding into joint-structured ones to better learn the appearance variation related to the motion of each joint. To improve the representation ability of the pose embedding while maintaining memory efficiency, we introduce feature lines, a compact yet effective 3D representation, to model more fine-grained details of human appearances. Furthermore, given a query pose and a spatial position, a hierarchical query strategy is introduced to interpolate pose embeddings and acquire the conditional pose feature for dynamic human synthesis. Overall, PoseVocab effectively encodes the dynamic details of human appearance and enables realistic and generalized animation under novel poses. Experiments show that our method outperforms other state-of-the-art baselines both qualitatively and quantitatively in terms of synthesis quality. Code is available at https://github.com/lizhe00/PoseVocab.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM SIGGRAPH 2023 Conference Proceedings

自引率

0.00%

发文量