Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

arXiv - CS - Graphics Pub Date : 2024-09-05 DOI:arxiv-2409.03718

Slava Elizarov, Ciara Rowles, Simon Donné

引用次数: 0

Abstract

Generating high-quality 3D objects from textual descriptions remains a challenging problem due to computational cost, the scarcity of 3D data, and complex 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. This enables strong generalization even with limited 3D training data (allowing us to use only high-quality training data) as well as retaining compatibility with guidance techniques such as IPAdapter. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models. The generated objects consist of semantically meaningful, separate parts and include internal structures, enhancing both usability and versatility.

查看原文本刊更多论文

几何图像扩散：基于图像的表面表示：快速、数据高效的文本到三维技术

由于计算成本、三维数据稀缺和复杂的三维表示，从文本描述生成高质量的三维对象仍然是一个具有挑战性的问题。我们介绍了几何图像扩散（GIMDiffusion），这是一种新颖的文本到三维模型，它利用几何图像来使用二维图像有效地表示三维形状，从而避免了对复杂的三维感知架构的需求。通过集成协作控制机制，我们利用了稳定扩散等现有文本到图像模型丰富的 2D 先验。这样，即使在三维训练数据有限的情况下（允许我们只使用高质量的训练数据），也能实现很强的泛化能力，同时还能保持与 IPAdapter 等引导技术的兼容性。简而言之，GIMDiffusion 能够以与当前文本到图像模型相当的速度生成三维资产。生成的对象由具有语义意义的独立部分组成，并包含内部结构，从而提高了可操作性和通用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Graphics

自引率

0.00%

发文量