TextDeformer:使用文本指导的几何操作

ACM SIGGRAPH 2023 Conference Proceedings Pub Date : 2023-04-26 DOI:10.1145/3588432.3591552

William Gao, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, R. Hanocka

{"title":"TextDeformer:使用文本指导的几何操作","authors":"William Gao, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, R. Hanocka","doi":"10.1145/3588432.3591552","DOIUrl":null,"url":null,"abstract":"We present a technique for automatically producing a deformation of an input triangle mesh, guided solely by a text prompt. Our framework is capable of deformations that produce both large, low-frequency shape changes, and small high-frequency details. Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO. Notably, updating mesh geometry by taking gradient steps through differentiable rendering is notoriously challenging, commonly resulting in deformed meshes with significant artifacts. These difficulties are amplified by noisy and inconsistent gradients from CLIP. To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner (rather than locally-sub-optimal steps). Our key observation is that Jacobians are a representation that favors smoother, large deformations, leading to a global relation between vertices and pixels, and avoiding localized noisy gradients. Additionally, to ensure the resulting shape is coherent from all 3D viewpoints, we encourage the deep features computed on the 2D encoding of the rendering to be consistent for a given vertex from all viewpoints. We demonstrate that our method is capable of smoothly-deforming a wide variety of source mesh and target text prompts, achieving both large modifications to, e.g., body proportions of animals, as well as adding fine semantic details, such as shoe laces on an army boot and fine details of a face.","PeriodicalId":280036,"journal":{"name":"ACM SIGGRAPH 2023 Conference Proceedings","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"TextDeformer: Geometry Manipulation using Text Guidance\",\"authors\":\"William Gao, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, R. Hanocka\",\"doi\":\"10.1145/3588432.3591552\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a technique for automatically producing a deformation of an input triangle mesh, guided solely by a text prompt. Our framework is capable of deformations that produce both large, low-frequency shape changes, and small high-frequency details. Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO. Notably, updating mesh geometry by taking gradient steps through differentiable rendering is notoriously challenging, commonly resulting in deformed meshes with significant artifacts. These difficulties are amplified by noisy and inconsistent gradients from CLIP. To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner (rather than locally-sub-optimal steps). Our key observation is that Jacobians are a representation that favors smoother, large deformations, leading to a global relation between vertices and pixels, and avoiding localized noisy gradients. Additionally, to ensure the resulting shape is coherent from all 3D viewpoints, we encourage the deep features computed on the 2D encoding of the rendering to be consistent for a given vertex from all viewpoints. We demonstrate that our method is capable of smoothly-deforming a wide variety of source mesh and target text prompts, achieving both large modifications to, e.g., body proportions of animals, as well as adding fine semantic details, such as shoe laces on an army boot and fine details of a face.\",\"PeriodicalId\":280036,\"journal\":{\"name\":\"ACM SIGGRAPH 2023 Conference Proceedings\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGGRAPH 2023 Conference Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3588432.3591552\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGGRAPH 2023 Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3588432.3591552","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

我们提出了一种自动生成输入三角形网格变形的技术，仅由文本提示引导。我们的框架能够产生大的、低频的形状变化和小的高频细节的变形。我们的框架依赖于可微分渲染将几何图形连接到强大的预训练图像编码器，如CLIP和DINO。值得注意的是，通过可微分渲染采取梯度步骤来更新网格几何形状是出了名的具有挑战性，通常会导致具有显著伪影的变形网格。这些困难被来自CLIP的嘈杂和不一致的梯度放大了。为了克服这个限制，我们选择通过雅可比矩阵来表示我们的网格变形，雅可比矩阵以全局平滑的方式更新变形(而不是局部次优步骤)。我们的关键观察是，雅可比矩阵是一种倾向于更平滑、更大变形的表示，导致顶点和像素之间的全局关系，并避免局部噪声梯度。此外，为了确保生成的形状在所有3D视点上都是一致的，我们鼓励在渲染的2D编码上计算的深度特征在所有视点上对给定顶点保持一致。我们证明了我们的方法能够平滑地变形各种各样的源网格和目标文本提示，实现对动物身体比例的大修改，以及添加精细的语义细节，例如军靴上的鞋带和面部的精细细节。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TextDeformer: Geometry Manipulation using Text Guidance

We present a technique for automatically producing a deformation of an input triangle mesh, guided solely by a text prompt. Our framework is capable of deformations that produce both large, low-frequency shape changes, and small high-frequency details. Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO. Notably, updating mesh geometry by taking gradient steps through differentiable rendering is notoriously challenging, commonly resulting in deformed meshes with significant artifacts. These difficulties are amplified by noisy and inconsistent gradients from CLIP. To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner (rather than locally-sub-optimal steps). Our key observation is that Jacobians are a representation that favors smoother, large deformations, leading to a global relation between vertices and pixels, and avoiding localized noisy gradients. Additionally, to ensure the resulting shape is coherent from all 3D viewpoints, we encourage the deep features computed on the 2D encoding of the rendering to be consistent for a given vertex from all viewpoints. We demonstrate that our method is capable of smoothly-deforming a wide variety of source mesh and target text prompts, achieving both large modifications to, e.g., body proportions of animals, as well as adding fine semantic details, such as shoe laces on an army boot and fine details of a face.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM SIGGRAPH 2023 Conference Proceedings

自引率

0.00%

发文量