Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera
{"title":"MVGaussian:利用多视图引导和表面致密化技术生成高保真文本到三维内容","authors":"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera","doi":"arxiv-2409.06620","DOIUrl":null,"url":null,"abstract":"The field of text-to-3D content generation has made significant progress in\ngenerating realistic 3D objects, with existing methodologies like Score\nDistillation Sampling (SDS) offering promising guidance. However, these methods\noften encounter the \"Janus\" problem-multi-face ambiguities due to imprecise\nguidance. Additionally, while recent advancements in 3D gaussian splitting have\nshown its efficacy in representing 3D volumes, optimization of this\nrepresentation remains largely unexplored. This paper introduces a unified\nframework for text-to-3D content generation that addresses these critical gaps.\nOur approach utilizes multi-view guidance to iteratively form the structure of\nthe 3D model, progressively enhancing detail and accuracy. We also introduce a\nnovel densification algorithm that aligns gaussians close to the surface,\noptimizing the structural integrity and fidelity of the generated models.\nExtensive experiments validate our approach, demonstrating that it produces\nhigh-quality visual outputs with minimal time cost. Notably, our method\nachieves high-quality results within half an hour of training, offering a\nsubstantial efficiency gain over most existing methods, which require hours of\ntraining time to achieve comparable results.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification\",\"authors\":\"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera\",\"doi\":\"arxiv-2409.06620\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of text-to-3D content generation has made significant progress in\\ngenerating realistic 3D objects, with existing methodologies like Score\\nDistillation Sampling (SDS) offering promising guidance. However, these methods\\noften encounter the \\\"Janus\\\" problem-multi-face ambiguities due to imprecise\\nguidance. Additionally, while recent advancements in 3D gaussian splitting have\\nshown its efficacy in representing 3D volumes, optimization of this\\nrepresentation remains largely unexplored. This paper introduces a unified\\nframework for text-to-3D content generation that addresses these critical gaps.\\nOur approach utilizes multi-view guidance to iteratively form the structure of\\nthe 3D model, progressively enhancing detail and accuracy. We also introduce a\\nnovel densification algorithm that aligns gaussians close to the surface,\\noptimizing the structural integrity and fidelity of the generated models.\\nExtensive experiments validate our approach, demonstrating that it produces\\nhigh-quality visual outputs with minimal time cost. Notably, our method\\nachieves high-quality results within half an hour of training, offering a\\nsubstantial efficiency gain over most existing methods, which require hours of\\ntraining time to achieve comparable results.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06620\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification
The field of text-to-3D content generation has made significant progress in
generating realistic 3D objects, with existing methodologies like Score
Distillation Sampling (SDS) offering promising guidance. However, these methods
often encounter the "Janus" problem-multi-face ambiguities due to imprecise
guidance. Additionally, while recent advancements in 3D gaussian splitting have
shown its efficacy in representing 3D volumes, optimization of this
representation remains largely unexplored. This paper introduces a unified
framework for text-to-3D content generation that addresses these critical gaps.
Our approach utilizes multi-view guidance to iteratively form the structure of
the 3D model, progressively enhancing detail and accuracy. We also introduce a
novel densification algorithm that aligns gaussians close to the surface,
optimizing the structural integrity and fidelity of the generated models.
Extensive experiments validate our approach, demonstrating that it produces
high-quality visual outputs with minimal time cost. Notably, our method
achieves high-quality results within half an hour of training, offering a
substantial efficiency gain over most existing methods, which require hours of
training time to achieve comparable results.